Metropolitan
m
New York
Library Council
Abraham Lincoln Brigade Archives, New York University
American Museum of Natural History
Barnard College
Center for Jewish History
Central/Western Massachusetts Automated Resource Sharing System
Central Massachusetts Regional Library System
Chelsea District Library, Michigan
Clemson University
Colorado State University
Columbia University
County of Brant Public Library
Fort Hays State University
Illinois State University
Jewish Theological Seminary
Kansas Cosmosphere and Space Center
Leo Baeck Institute
Louisiana State University
Louisiana State University Health Sciences Center
Massachusetts Regional Library System
Metropolitan New York Library Council
Middle Tennessee State University
Museum of Seminole County History in Sanford, Florida
New York Public Library
Orange County Library System, Florida
Orange County Regional History Center, Florida
Oregon State University
Pratt Institute
Queens College, City University of New York
Rollins College
Rutgers University
Southeastern New York Library Resources Council
Southern Illinois University Edwardsville
Southern New Hampshire University
University of Alabama
University of Central Florida
University of Iowa ■
University of Massachusetts Bost
Western Carolina University
Western Massachusetts Regional Library System
Wisconsin Historical Society
Yale University f (¥)
9' 780615" 379982
m THE
Lessons Learned from
Small and Medium-Sized
Digitization Projects
Kwong Bor Ng
& Jason Kucsma
Digitization in the
Real World
Lessons Learned from
Small and Medium-Sized
Digitization Projects
Edited by
Kwong Bor Ng & Jason Kucsma
^8 Metropolitan New York Library Council
i
Published in the United States of America by
Metropolitan New York Library Council
57 East nth Street, 4th floor
New York, NY 10003-4605
p: (212) 228-2320 f: (212) 228-2598
Web site: http://www.metro.org
ISBN: 978-0-615-37998-2
Cover Design: Jason Kucsma (illustration by Smartone Design,
licensed via iStockphoto.com)
Reviewers Committee: Mark F. Anderson, Jill Annitto, Anna Craft, Jody
DeRidder, Renate Evers, Wei Fang, Maureen M. Knapp, Sue Kunda,
Mandy Mastrovita, Ken Middleton, Emily Pfotenhauer, Mark Phillipson,
Alice Piatt, Mary Z. Rose, Stacy Schiff, Jennifer Weintraub, Andrew
Weiss.
Copyright © 2010 by Metropolitan New York Library Council. No part of
this book may be reproduced in any form or by any means, electronic or
mechanical, including photocopying, without written permission from
the publisher.
The views expressed in this book are those of the authors, but not
necessarily those of the publisher.
ii
About the Editors
Kwong Bor Ng (kbng@qc.cuny.edu) is an associate professor at the
Graduate School of Library and Information Studies, Queens College,
CUNY. His research interests are in the technical and technological
areas of knowledge representation, organization, storage and retrieval.
Dr. Ng has published more than 20 academic papers in refereed
journals. His most recent publications include Using XML: A How-to-
do-it Manual and CD-ROM for Librarians (published by Neal
Schuman in 2007) and Collaborative Technologies and Applications
for Interactive Information Design: Emerging Trends in User
Experiences (co-edited with S. Rummler, published by Information
Science Reference, IGI-Global, Inc. in 2009.)
Jason Kucsma 0kucsma@metro.org) is the Emerging Technologies
Manager at the Metropolitan New York Library Council where he
manages METRO'S Digitization Grant Program and is the point
person for member inquiries related to the resources, training and
referral services associated with digitization, digital preservation and
emerging technologies issues. Jason received his M.A. in American
Culture Studies from Bowling Green State University and an M.L.S.
from the University of Arizona School of Information Resources and
Library Science. He is currently a part-time lecturer in Rutgers's
Library and Information Science graduate program, and is a recent
graduate of ALA's 2009 Emerging Leaders Program.
iii
Table of Contents
Foreword vii
Preface ix
Part I — Small is Beautiful: Planning and Implementing
Digitization Projects with Limited Resources
DIY Digitization: Creating a Small-scale Digital Zine
Exhibit l
Digitizing Civil Rights: An Omeka-based Pilot Digital
Presence for the Queens College Civil Rights Archive 23
Digitization on a Dime: How a Small Library and a Big
Team of Volunteers Digitized 15,000 Obituaries in Just
Over a Year 41
Building the ALBA Digital Library 57
Digitization and Access of Louisiana Oral Histories: One
Oral History Center's Experience in the Digital Realm 68
Digitizing a Newspaper Clippings Collection: a Case Study
and Framework for Small-Scale Digital Projects 86
METRO Grant Success Story: Waterways of New York
Project 100
Part II - A Diverse Digital Landscape: Digital Collections in
Public Libraries, Museums, Cultural Heritage Institutions,
and Knowledge-Based Organizations
Managing Rights in a Medium Scale Audio Digitization
Project 125
The In-House Digital Laboratory: Possibilities and
Responsibilities 136
iv
Building a Virtual Library - A Case Study at The Library of
The Jewish Theological Seminary 154
Local Objects, Local People, Local History: Creating the
Wisconsin Decorative Arts Database 172
Digitizing the Rare Book Collection of the Leo Baeck
Institute 185
The Caprons of Paris: A Digitization Project in a Small
Library System 195
The Mass. Memories Road Show: A State-Wide Scanning
Project 223
Picturing the Museum: Education and Exhibition at the
American Museum of Natural History 242
Part III - The Digital Campus: Digitization in Universities
and Their Libraries
Developing an Institutional Repository at Southern New
Hampshire University: Year One 261
Digitization of the Yale Daily News Historical Archive 274
New Jersey Digital Legal Library 289
Pratt Institute: A Historical Snapshot of Campus and Area 302
Scaling Back for an "Experimental" Collection 321
From Confusion and Chaos to Clarity and Hope:
Reorganization of Work Flows, Processes, and Delivery
for Digital Libraries 333
Digitizing Colorado State University's Historic Photograph
Collection: A Case Study 355
Entering the Digitization Universe: One Catalog Librarian's
Experience at an Academic Library 370
V
From Argentina to Zambia: Capturing the Digital A to Z's
of a Child Art Collection 383
Special Collections, Digitization, and the Classroom: A
New Model 402
Part IV - One Plus One is Greater Than Two: Collaborative
Projects
Digital Treasures: The Evolution of a Digital Repository in
Massachusetts 423
Collaborative Digitization Goes Local 435
Picturing the Past and Planning for the Future: Central
Florida Memory 450
Apollo 13.0: Digitizing Astronaut Jack Swigert's Apollo
Documents 470
Collaborative-Centered Digital Curation: A Case Study at
Clemson University Libraries 490
The Craft Revival Project: Library Leadership in Creating
Connections between Small Cultural Institutions 503
Hudson River Valley Heritage: A Journey in Collaborative
Digitization 518
Collaborating for Success: A Cross-Departmental
Digitization Project 541
Using Omeka to Build Digital Collections: The METRO
Case Study 556
vi
Foreword
Dottie Hiebing (METRO Executive Director)
For more than 45 years, METRO has worked to provide opportunities
for libraries to share best practice strategies to address many critical
needs. In these efforts, we have often seen that the best learning
comes through examples of libraries that have addressed important
challenges successfully - and sometimes not so successfully. This has
been especially true in efforts to support large and small digitization
projects.
For more than a decade - and continuing today - digitization
has been established as an essential focus for many libraries as well as
for research centers, museums, and cultural and arts organizations.
METRO has worked to support our members in these efforts with a
range of grants, training programs and instructional materials.
Digitization in the Real World represents a significant new
milestone in our commitment to providing library professionals with
the hands-on experience and guidance they need to plan, execute, and
manage digitization projects over the long term. In many ways, the
examples presented in this volume show library professionals how to
maximize the value and impact of digitization efforts for their libraries
and their users.
This book also represents the first self-published text METRO
has ever sponsored. As we continually look for new ways to help
libraries stay ahead of the curve in digitization, technology and other
areas, this strategy has the clear potential to be a major focus of our
work in the years ahead. We will welcome your feedback and look
forward to seeing how self-published materials such as this can
support our mission and your needs moving forward.
vii
On behalf of METRO, I would like to congratulate and thank
editors Kwong Bor Ng and Jason Kucsma and all of the members of
the library community who supported this project and who
contributed of their time and insight in the development of these
outstanding digitization case studies. They have created a vital new
resource to help libraries continue to advance important digitization
projects, and their efforts will have a profound and lasting impact on
the future of these efforts in the years ahead.
viii
Preface
Kwong Bor Ng & Jason Kucsma
For more than a decade, digitization has been both a critical need and
a formidable challenge for libraries, archives, and museums around
the world. To support these important projects, the Metropolitan New
York Library Council (METRO) has been awarding annual grants to
support digitization projects in New York City and Westchester
County since 2005. Thus far, METRO has provided support for
approximately 40 digitization projects at 25 different institutions. In
those five years, we have learned a great deal about managing
digitization projects effectively. In these efforts, METRO members
have also shared best practice strategies in digitization through
project showcase events and through the work of the METRO-
sponsored Digitization Special Interest Group.
All digitization projects begin with some critical questions. How
do we start a digitization project? What standards should we use for
digital conversion and metadata? What are the best practices for
workflow? What equipment or software should we use? Should we
digitize in-house or outsource? What organizational or technological
obstacles should we anticipate, and how should we negotiate them?
Where can we turn for help in the middle of a project?
Naturally, the response to these questions will differ for different
institutions. Even discrete projects within an institution will have
many unique characteristics and challenges. But shared stories of
successes (and failures) can be immensely helpful in supporting future
digitization projects. To that end, Professor Ng came to METRO in the
summer of 2009 with a great suggestion. Why not collect some of the
most compelling examples of recent digitization projects? Many of us
are familiar with the large-scale mass digitization projects of recent
years. But Ng suggested — and we agreed — that there was a great
ix
opportunity to share insights from lesser-known examples from the
"real world." That's not to say that large-scale projects don't pose their
own unique issues and learning opportunities for librarians,
archivists, and technologists. But many libraries are more likely to
proceed with smaller-scale digitization projects made possible by a
special need or unique opportunity, a first-time grant, or the special
dedication of a team of library professionals. Collectively, these efforts
can provide many invaluable perspectives and procedural models.
This book was initially conceived as an opportunity to highlight
digitization efforts in the New York metropolitan area. Our research
quickly showed that there were many other project examples worth
sharing. The response to our initial call for proposals was
overwhelming; we received hundreds of chapter proposals from all
over the world in just the first few months. Contacts from many of the
world's leading knowledge-based organizations, cultural institutions
and university libraries presented examples of projects representing a
wide range of topics, perspectives, approaches, concerns, and lessons-
learned.
The effort to choose from among these examples the examples
that would be presented in the book was a daunting task. We were
unable to include many great case studies. Each of the chapters
presented was reviewed in a double-blind peer-review process to
assess quality, accuracy and relevance. The 34 papers presented in
this book represent our best effort to present a diverse and
comprehensive overview of key issues in the management and
realization of digitization projects.
We have divided the case studies into four primary groups. The
first section focuses on small projects. They are digitization endeavors
that moved forward with limited resources and staffing. The second
group showcases digitization projects from diverse cultural
institutions including public libraries, museums, research institutes,
and cultural organizations. The third group consists of digitization
projects based on medium-sized collections at universities and their
libraries. The last group features projects that brought together
X
multiple institutions to work in collaboration on a project of mutual
interest.
This book would not have been possible without the participation
and hard work of all of the authors and reviewers involved, including
those who submitted chapters that we were not able to accommodate.
We're also greatly indebted to Dottie Hiebing, Executive Director of
METRO, for recognizing the need for this important resource and for
supporting this effort from inception. This is the first of what we hope
will become a series of instructional self-publishing projects
supported by METRO in the years ahead.
This is, above all, a book written by practitioners for practitioners
who together recognize the critical needs and goals in digitization in
our industry. Our hope is that it will be useful to students who are
preparing for a career in library or research science and to
practitioners who will shape the future of digitization for the library
community. We know reading these stories has been enlightening for
both of us, and we hope it will be for you as well. Thank you for
reading.
xi
xii
Part I - Small is Beautiful:
Planning and Implementing
Digitization Projects with
Limited Resources
DIY Digitization: Creating a Small-scale
Digital Zine Exhibit
Melissa L. Jones (College Summit)
Abstract
The Barnard Library Zine Collection is an innovative special collection
of dynamic popular culture artifacts. The zines in the collection
provide a democratic and vibrant glimpse into the movements and
trends in recent feminist thought through the personal work of artists,
writers, and activists. The author finds that in order to improve access
to and generate interest in such niche collections, institutions have a
responsibility to overcome barriers to digitization and begin sharing
their collections online. This chapter discusses the development of
Barnard's first zine digitization project: the Elections and Protests:
Zines from the Barnard Library Collection Online Exhibit, launched
in the summer of 2008. The successful project demonstrates that it is
possible to build effective and engaging small-scale digital collections
using simple and inexpensive technologies.
Keywords: Barnard College Library, Copyleft, Copyright, Education,
Elections, Lesson plans, Online exhibit, Political zines, Primary
sources, Protest, Special collections, Zines.
Introduction
The Barnard College Library began collecting zines in 2003 in an
effort to document third wave feminism and riot grrrl culture. Zines
are self-published, usually inexpensively produced works by writers
who subscribe to a Do It Yourself (DIY) philosophy. Generally, zines
1
2 DIY Digitization
are created out of an interest to communicate or express ideas that
might not otherwise find acceptance in the mainstream media.
Although zines as we know them today were born from the punk
movement of the early 1970s (Duncombe, 1997, p. 21), they are part of
a long history of small-run and "amateur" publication. Whether
calling colonialists to arms in the days of the American Revolution or
subverting censorship and challenges to free speech in Soviet Russia
(Wright, 1997), alternative publications are a natural and important
tool for preserving free speech.
Although zines are low rent ephemera, several public and
academic libraries across the country have begun to recognize their
value. At the forefront of the field, Barnard's collection has nearly
2,500 holdings providing unmediated access to the voices of young
women on such subjects as race, gender, sexuality, childbirth,
motherhood and politics. Zine Librarian Jenna Freedman's outreach
and advocacy work helps to legitimize zines, not as radical historical
footnotes but as valid literary and historic works worthy of collection,
preservation and study.
As the Zine Intern in summer 2008, my role was to help
Freedman to increase access to and interest in the Zine Collection. The
result of my work was Barnard Library's first digital collection, an
online exhibit entitled, Elections and Protests: Zines from the
Barnard Library Collection. This project employed a DIY approach to
digitization, making use of materials and resources at hand to solve
problems and overcome challenges rather than relying on mainstream
or out-of-the-box technologies. This project demonstrates that small-
scale digitization projects can be topical, useful and impactful for a
variety of stakeholders.
Literature Review and Needs Assessment
The literature surrounding zines reveals that, as unique primary
source documents, they can serve as valuable research tools.
Alternative press advocates such as librarians Chris Dodge and Jim
Danky argue that self-published ephemera like zines, handbills, and
military newspapers can provide a glimpse into a part of history that
DIY Digitization 3
includes the voices of marginalized individuals and groups which
would otherwise be lost were they not collected (Dodge, 2008).
Dempsey (2006) notes that to collect the ephemeral and radical
"long tail" is not enough; institutions have a responsibility to provide
users with access points and contextual materials in order to
maximize use. Liu (2007) notes that in order to better serve users,
"academic library Web sites should ... switch the focus from
presenting information arranged according to library functions and
resources to providing targeted and customizable tools and services to
library users ... and give users opportunities to express, share, and
learn." In addition to their value as historical documents, zines also
serve as powerful teaching tools for media literacy (Wan 1999;
Congdon, 2003; Daly, 2005), but scholars and teachers need both
access to zines and support for teaching with these unique documents
in order to capitalize on this potential.
Lesk (2007) acknowledges the legal and philosophical issues that
are inherent in digitization work, but advocates strongly for
institutions and copyright holders to work together to overcome
challenges due to the potential value of digital materials for research.
In order to support online research, some public and academic
institutions have begun digitizing their special collections.
Unfortunately at the time of this project, no public or academic
institution had moved to digitize their zine collections.
The lack of high-quality materials for studying and teaching zines
online makes interacting with the genre impossible for anyone
without physical access to a collection. Most public and academic
institutions allow access to their zine collections mainly through
catalog search. Some institutions occasionally mount online exhibits
that include scans of zine covers only.
This has been due, in part, to the same barriers that hinder other
digitization projects such as prohibitive cost, lack of time, and
technological limitations. Additionally, zine librarians and scholars
identify the intrinsically physical nature of the genre as another
reason not to prioritize zine digitization. Migrating zine content to a
digital form is seen by many in academia to undermine the very heart
4 DIY Digitization
of the genre, which is to be rooted in physical interaction between
zinester, zine, and reader. Duke University's Zine Librarian argues
that, "...zines are created by hand, crafted with paper, scissors, tape,
glue, staples. They were meant to be handed from person to person,
physically shared. The experience of handling zines in person, turning
each page to reveal intimate secrets, funny comics, and poetry, can't
be duplicated on-line. You would get the content, but miss out on the
physical experience (Wooten, 2009)." Any academic digital Zine
Collection would need to be very conscious of its treatment of digital
surrogates.
Concerns about copyright, permission and privacy create another
barrier to digitization. Copyright is a sticky issue when it comes to
zines as a genre, which, by definition are created to be shared. Thus,
many zines contain a copyleft statement, or some other notation of
whether the owner has given permission for its contents to be
reproduced. "Copyleft" is a term coined by open-source software
pioneers to describe a "flipping" of traditional copyright laws that
allows content owners to grant broader permission for their work to
be shared. This "General Public License" can be applied in any
situation where copyright might apply, including software, books,
images and music (Soderberg, 2002). Generally, copyleft permission
or GPL is considered to be conditional; zinesters who select copyleft
status for their work, or those who claim no legal protection at all, still
expect to be credited, or at least respected, for their work. It is poor
zine etiquette to steal, borrow, or sell someone else's zine for personal
gain.
Private zine online library and archive groups, run by zinesters
and fans, have developed to fill the void of zines on the web. The sites
digitize a large number of zines and serve as valuable repositories of
content for experts in the field. Because they have grown organically
from the zine community, these sites maximize their relationships to
avoid and address concerns about copyright.
For Barnard Library, the the benefits of digitization digitization
provided an incentive to overcome potential barriers, challenges and
costs. An Access and Use Survey of known users administered in
DIY Digitization 5
2008 revealed that, while the Zine Collection has a strong contingent
of feminist and zinester stakeholders, Barnard Library could be doing
more to attract users outside the immediate scholarly and cultural
community (see Figure ZINE-i.).
Answer Options §
Figure ZINE-1: Results of the Barnard Library Zine Collection Access and Use
Survey, administered to 25 known users in July 2008.
6 DIY Digitization
The survey confirmed that a small-scale digitization project would
be a valuable addition to Barnard Library's existing services. 81% of
known users reported that they would use curated, online exhibits
about zines and zine history. Additionally, 81% of users said that they
would use digital scans of selected zines.
The findings of the survey reflect the expectation by users that a
library's website provide more than just access to information. By
digitizing the popular and well-respected zine collection, Barnard
Library could capitalize on the strength of its special collection to meet
the needs of existing users, attract new users, and fill a need in the
existing digital zine landscape. Additionally, a digital Zine Collection
could help spread the word about the value of zines as historical
documents and teaching tools to a new generation of potential
stakeholders.
Project Planning
After making the decision to create a small digital zine collection, I
created a project plan that included setting clear goals for the project.
Goal Setting
Digitizing even a small portion of Barnard's Zine Collection would
have many benefits for the institution, its users, and the historical
record. These included:
Improving access: Currently, membership in the
Barnard/Columbia learning community is required in order to secure
free access to the zine collection. Digitization would allow zines to be
downloaded and shared easily, improving the ability of people from
across the globe to access and learn from the collection.
Raising awareness about zines as legitimate historical
objects: Freedman's work as an advocate for zine and other radical
special collections would be complemented by a well-selected
digitization project that is supported by descriptive and educational
materials.
Highlighting Barnard's women's studies collection and
drawing researchers to the institution: Barnard's Zine
DIY Digitization 7
Collection sets its women's studies research collection apart from
other academic institutions. An online exhibit of materials from the
Zine Collection could emphasize its uniqueness and eventually bring
more researchers to the collection.
Preservation of the collection: Because most zines are
produced cheaply using poor quality paper and inks, long-term
conservation can be an issue. Digitizing zines makes their content
available to future generations of researchers, students, and other
stakeholders while preserving their physical form.
Project Scope
The scale of this project was by necessity very small. No fund was
designated for the project. The site would need to be built and
function within the existing Library website's structure; no money was
available for purchasing a Content Management System or developing
a complex metadata or image database. All work would need to be
done using hardware and software already in Barnard Library's
possession, or available open-source on the Web. As the Zine Intern, I
would be the sole staff member available to work on the project.
Freedman would supervise and approve my work. The project would
need to be completed over the course of my summer internship,
lasting only 100 hours over the course of 10 weeks.
Content Selection
Digitizing the collection as a whole proved to be too time
consuming and technologically complex given these limitations.
Selecting a small group of zines in a given theme or subject area to
digitize first seemed a good model to begin with. Prioritizing
digitization by demonstrated user need is a model that has been
successful for other institutions. The University of Warwick in the UK,
for example, developed an innovative research project in which
students created digital surrogates of the 18th century French plays
they used in their coursework (Astbury, 2006). Following a similar
needs-based model would ensure that Barnard's first digital collection
would be used by its most immediate stakeholders.
8 DIY Digitization
In 2008, the country was gearing up for an historic presidential
election. Earlier that year, the previous Zine Intern, Julie Turley had
created an exhibit of "Election and protest themed zines" to connect
the institution's holdings with current events. The physical exhibit,
which lived in the library's lobby, featured copies of selected zines and
photocopied extracts of pages. From the Republican National
Convention to the presidential election, from deciding to take your
child to a political rally to challenging politicians to be responsible to
their electorate, the featured zines addressed participation in — or
protest against — the American political process.
The exhibit was a natural fit for this digitization project. The
presidential election was only months away, we knew conversations
about the political process would be a hot topic on campus. The
selected zines offered a little-seen counterpoint to mainstream
political coverage, rejecting voting as the sole means to make change
in this nation. Moreover, educators across the nation would be looking
for ways to talk about elections and the political process in their
classrooms. It would be an excellent opportunity to demonstrate that
zines can be relevant political and educational tools. Since zines are
political in nature and often overtly political in topic, our digital
collection would be reflective of the genre as a whole, even though we
could only digitize a small number of zines. Finally, because the zines
in this subset were already on display in the lobby, we knew that none
were in need of conservation work or otherwise in danger of being
damaged by the process of digitization.
Project Implementation
To maximize the benefits of digitization while addressing the barriers
faced by the institution, I undertook a multi-step process for digitizing
and presenting Barnard zines online. The process, like the zines
themselves, was low-rent, low-tech, and outside the mainstream. The
DIY approach was limiting in many ways, but also served as an
excellent opportunity for learning and innovation.
DIY Digitization 9
DIY Digitization Project Timeline - May 5 th through July 24 th , 2008
1 dol\
1 IIIICIIIIC
Needs assessment and literature review
Prior to project start
Goal setting and scope definition
Prior to project start
Content selection
May o
OtJlltJldtiriy rTlcldUdld
Maw 1 9 th
iviay iz
L'Ompeiiuve lanuscape analysis
Maw 19 th thrnnnh Mai/ 90 th
iviay \c. inrougn iviay
oreanny sue rndps diiu wireiTdrnes
Maw 19 th fhmiinh Maw 1 ^ th
iviay iz inrouyn ividy lo
Convriaht reauests to Dubtish sent to 7inesters
May 1 9 th
Scanning and digitization
May 19 th through June 17 th
Designing an intuitive user interface
May 20 th through July 21 st
Writing original content
June 1 st through July 18 th
Usability testing
July 19 th through July 21 st
Site launch and publicizing
July 23 rd through July 24 th
Evaluation and reporting
Ongoing
Figure ZINE-2: Project Timeline - May 5th through July 24th, 2008
Copyright Status and Securing Permissions
After selecting the zines to be digitized, securing permission to
present their content on the web was the next step. Educational use,
such as the creation of an exhibit, would likely fall within any
zinester's definition of copyleft. Only one zine of the ten selected,
"Radical Cheerbook," contained an explicit copyleft statement. We felt
confident that we could use its content in the exhibition.
Because the other nine zines selected for this exhibit contained
some kind of copyright statement or did not contain an explicit
copyleft statement, an effort was made to contact and secure
permissions from the original author. This effort was difficult,
however, since many zines were published using pseudonyms or
contain contact information that is out of date. To track down the
zinesters, I used a combination of Google searches, MySpace, and a
pre-catalog Microsoft Access database that Freedman maintains to
identify current email addresses. For one zinester, I was only able to
identify a mailing address, so I sent a letter and awaited a response.
io DIY Digitization
By the time the site was ready to go live in mid-July, I received
written permission to publish from six zinesters, with most expressing
excitement about the project. One zinester requested that I send scans
of the specific pages I'd hoped to include before giving permission. At
the bottom of each zine's page on the site, I made a note that the
copyright holder had given permission for Barnard to use scans from
the zines.
In three cases, I was not able to secure permission before the
launch of the website. In these cases, I added a note to each zine's
page that we had made a diligent effort to contact the copyright holder
and would remove the images used in the event that there was an
objection. I also made the decision to include only minimal excerpts
from these zines as compared to the more extensive scans used from
the zines for which we had permission.
Site Design & Comparative Landscape Analysis
Once permissions requests were sent, I focused my work on
designing the site's architecture and layout. Close analysis of the
features of similar sites can be a good way to begin planning. In order
to understand how zines and zine-like publications can be presented
online, I analyzed five sites with similar collections to Barnard.
Because there were at the time no academic institutions with large-
scale digital zine projects, I reviewed three sites run by private groups.
I also reviewed two academic digital collections that feature radical or
obscure publications.
My analysis revealed several qualities that most online exhibits of
zine-like material share.
Asset management: (l) All five sites included full-color image
scans with legible text and graphics; (2) All but one site included an
option to download the asset in PDF form; (3) Four out of five sites
included descriptive metadata about subject, author, and publication
date to aid in discovery and to give context to the asset
Navigation: (1) Every site evaluated had a descriptive homepage
and a consistent look and feel; (2) All five sites utilized global
navigation on each page to keep the user oriented.
DIY Digitization 11
Search and discovery: (l) Four out of five sites allowed users to
browse for a zine by title; (2) Four out of five sites offered a keyword
search function; (3) None of the sites offered a search by author or
issue number function; (4) Four out of five sites made searching or
browsing for a known-item simple and pleasurable.
Tools and customization: Every site evaluated offered a "printer-
friendly" version of their assets
Aesthetics and usability: (1) Every site took care to ensure that all
links and functions worked as they were expected to; (2) Four out of
five sites used some type of backend content-management system to
organize assets; (3) For the qualities adopted by all sites evaluated, I
attempted to include them.
Site Name and URL
Launch Date
Assets
Zine Library.net
httD://www.zinelibrarv.info/
None given
"hundreds" of
zines
The Queer Zine Archive Project
http://www.azap.orq
Nov 2003
154 issues
Punk Zine Archive
http://www.operationphoenixrecords.com/
archivespaqe.html
2004
120 issues
Ling Long Woman's Magazine @
Columbia University
http://www.columbia.edu/cu/lweb/diaital/
collections/linqlonq/index.html
2005
241 issues
Anarchism Pamphlets in the Labadie
Collection @ The University of Michigan
http://www.lib.umich.edu/spec-
coll/labadie/
1999
478 pamphlets
Figure ZINE-3: Sites Evaluated for Competitive Landscape Analysis
There were other qualities present in some sites but not in others.
These included RSS feeds, customizable user accounts and high-tech
page turners. Because these qualities appeared in only some sites, I
considered them to be optional for my site.
12 DIY Digitization
Interestingly, none of the sites offered any curriculum or
supporting finding aids that would add necessary context to the
materials. I planned to include lesson plans and a bibliography to
accompany my zine scans.
Creating a sitemap and wireframes.
I first sketched wireframes for my site using paper and pencil,
then translated those sketches into digital files. The wireframes turned
out to be ambitious, and due to time and skill constraints, I was forced
to scale down my original vision, but the creation of the sitemap and
wireframes helped me synthesize all my ideas for the site into one
visual presentation.
Figure ZINE-4. Final site map for the exhibit.
Generating Content
Scanning and digitization.
Ideally, for this project, preservation-quality scans should be
made of each zine, along with its entire contents at the highest
DIY Digitization 13
possible resolution in addition to any presentation and/or thumbnail
versions. However, I was limited in my ability to create high-
resolution scans due to several factors. The first issue was the
constraints of the hardware and software at my disposal. I worked on
scanners and computers that Barnard undergraduates have access to
in the library's computer lab. More sophisticated equipment might
have resulted in better scans.
The condition of the zines themselves also contributed to poor
resolution. Because most zines are produced using cheap materials,
many of the oldest were beginning to deteriorate as the paper
yellowed and the ink faded. Additionally, poor photocopying
resolution in the original zine made some of the digitized page images
appear grainy and pixilated.
After much experimentation, I balanced preservation and
presentation needs with time and resource scarcity by scanning each
zine once at 600 DPI or higher. Next, I saved two JPG versions of each
zine: a presentation copy at a maximum height of 6oopx and a
thumbnail copy at a maximum height of 90px. I preserved the aspect
ratio of each scan each time I resized the image. When necessary, I
used image editing software — either the open-source GIMP
(http://www.gimp.org) or Adobe Photoshop, depending on what was
installed on the computer I was working on that day - to tweak the
contrast levels of the scan and improve legibility. For each zine's
cover, I created a slightly larger thumbnail which is presented on the
Home page and the "Featured Zines" page. The last step was to
convert all of the JPGs to PDF and create a print version of each zine
for users to download.
These digitization decisions allowed me to produce legible copies
of each zine while maximizing disk space. Because I didn't have access
to an image database or a content management system, I simply
organized all of the files in a series of folders on my desktop, giving
each file a descriptive name following a clear convention. When the
exhibit was complete, these folders were uploaded, along with the
HTML and CSS files to the Barnard server.
14 DIY Digitization
Generating metadata.
I was able to take the metadata for the exhibit from the existing
OPAC records. Each zine is currently assigned cataloged in a MARC
record as part of the Barnard/Columbia joint OPAC, CLIO. Current
metadata includes: title, an author or creator (when applicable),
physical description, a publisher and date, Library of Congress subject
headings and a summary or abstract. For each zine's gallery page, I
used only the author, title, summary and call number fields.
Additionally, I included a link to CLIO so users could locate the zine,
check on its availability, and order it through interlibrary loan. I was
able to add additional metadata about individual zinesters who
responded to my copyright requests, including links to each zinester's
current projects or personal websites.
Writing Original Content
A major component of the site was the contextual material that
would add value and meaning to the zine scans for users. To meet this
need, I wrote an "About" page describing the exhibit, as well as a "For
Teachers" page that included:
• a brief explanation of why zines make good teaching tools
• three essential questions related to zines which could be used
to frame curriculum planning
• A list of suggested resources for educators
The most time consuming content pieces to develop were the
three lesson plans designed to help educators teach with the zines in
the exhibit. Drawing on my background as a teacher, I designed these
lesson plans around essential questions related to media literacy and
social science content areas, then aligned them to three different
learning levels: intermediate, secondary, and post-secondary. The
lesson plans are student-centered and challenge students to interact
with the zines in the exhibit through discussion and evaluation.
Building and Testing the Site
Ideally, the user interface for any site should be intuitive and
promote discovery. For this project, I was limited to very basic web
DIY Digitization 15
design software and programming languages that have a low-barrier
to mastery, but I was determined to make the site as usable as possible
given the constraints.
To build the site, I again used hardware and software in the
Barnard Library computer lab. I first attempted to build the site using
Microsoft FrontPage, and then switched to an open-source HTML
editor called Mozilla Kompozer (http://www.kompozer.net). Both
FrontPage and Kompozer have "WYSIWYG" interfaces - an acronym
for "what you see is what you get" - in that they allow users to create
web pages using an interface that mirrors how the final product will
appear (Myers, 1998). I needed to have a firm grasp on what was
happening in the actual code behind my pages as I made changes.
Additionally, I wanted to be able to customize my site, and the
templates available in FrontPage and Kompozer felt limiting.
After a few days of struggle, I settled on developing the code for
my site by hand using the simple text editing software available on
most computer operating systems. The flexibility of being able to work
on my files from any location made this project much easier to
complete in only 10 weeks. I created and edited my files in Microsoft
Notepad at Barnard Library, and could easily open them at home on
my Apple laptop using either TextEdit or a free trial version of the
excellent application, Coda (http://www.panic.com/coda/) which
combines a WYSIWYG interface with an easy-to-use text editor.
During the initial coding phase, I began by building a framework for
each site using a common layout, menu bar, and footer using simple
HTML tags such as those for images, links and tables. I also selected a
patriotic red-white-and-blue color scheme and created an image
banner to run along the top of the page that included the site's title in
a typewriter-style font and some randomly-placed stars to mimic a
zine-like feel. Later, the color scheme was changed to a more punk-
inspired pink and black, to better reflect the lack of overt patriotism
expressed in the zines themselves.
Once a basic page template was complete, I created a Cascading
Style Sheet (CSS) file which governed the look and feel of each page
and made coding the rest of the site easier. Using a CSS file is a simple
16 DIY Digitization
way to add style (such as fonts, colors and spacing) to every page in a
website without having to edit each page individually (Bos, 2010). In
order for the CSS file to work, I added a line of code in each page's
HTML file that directed a user's web browser to "link" to my style
sheet file, called "text.css". This file dictated the background and font
colors of each part of each page, the margins for the different dividers
and tables, and even the spacing of thumbnail images in my photo
gallery. Before beginning this project, I had never worked in CSS.
However, I found the language simple to learn and easy to use once I
understood the fundamentals. As I worked, I referenced countless
tutorials and open-source code available on the web to hack my way
through the rest of the coding process.
The biggest coding challenge was creating a workable photo
display gallery to present my zine scans. During the site mapping and
wire-framing (see Figure 8), I had determined that I wanted users to
interact with thumbnail versions of a zine's pages and select which
pages they'd like to see in a larger, presentation-quality view. At first, I
tried copying an open-source photo gallery HTML file that I found
online. This allowed me to build the bare bones of my page layout, and
indeed included thumbnail images and presentation-quality views.
However, the photo gallery had one weakness: every time I moved my
mouse off of the thumbnail I wanted to view, the presentation-quality
view disappeared! This would have made my site very difficult to use,
as it was impossible to scroll, zoom, or even save the presentation-
quality view while still keeping your mouse over the thumbnail view.
Finally, a programmer friend-of-the-collection helped out by
producing a small piece of JavaScript code that allowed me to keep the
presentation-quality view open without having to keep my mouse
perfectly still. This code allowed me to create a version of the site that
I was excited to test with users.
Usability Testing
Before going live, I tested the site's usability with three different
test subjects, each representing a different group of Barnard Library
stakeholders. Each test subject was given the same set of tasks to
DIY Digitization 17
complete. Figure ZINE-5 lists the tasks administered and whether or
not the subject was able to complete the task without guidance.
o a
o 3
3 0
CL
<<
-I ■■
Cfl
c
O
'\'1
Q
?
O
w
a
ZL
izr
■~
— -
O
=r
w
0
3
JJ
=T
■ ^
■1)
^
<
J"
3
~ 1
□
•ET
!=
■1/1
3
a
0
a.
■<
Q.
5
=1
0
j?
0
3
_— 1
3C
=:
O =±
Hi
T3
ITS
o
-*-•
ID
Z:
1— u
5 — —1
er O
A> <*>
:=
■<
N
Zt'
:=
cd
Do
r—
c
IJAV
0
cu
ft
Z>
ii
ft
3
0
Si
c
0
z\
CD
&i
"C
C"
1;
c
<C
3-'
cz
CE
CE
U
c
6
how
3
CO
R
CD
15
1=
—•■
n
c
c
c
CD
CE-
CE'
ZT
Cfi
CE
H
■
o'
r
CI
CE
CP
m a Q
— 3.
CD
5'
o
5t'
=r
C
cr
3
_j
to
O
—
3'
=>
m
o
cz
o
=1 =; s
CD
PI
o
c
CE'
CD
n
tE'
CD
O CD
CP o
M C=
=■ to
a
is:
cz
ft'
Cn
CD'
Figure ZINE-5: "Exhibits and Protest" Site Usability Tasks and Results.
18 DIY Digitization
In addition to identifying tasks that would be difficult for users to
complete, I also made anecdotal records of the test subjects'
comments and feedback. Based on the results of the testing, I made
the following improvements to the site:
Increased the size of all fonts used by lpx
• Added a mouse-over function to each zine cover image on the
homepage that listed its title to aid in identification
• Added a "download all" link to a PDF containing all image files
associated with each zine
Made the copyright documentation on each zine more
prominent
Launching and Publicizing
Upon completion of testing, the final version of the online exhibit
(http : / / www.barnard.edu /library/ zines / exhibits / online /
elections/index.html) was launched on July 23rd, 2008. A link to the
exhibit was posted on the Barnard College Library homepage, and a
blog post about it was added to the institution's Livejournal
(http://barnardzines.livejournal.com/). Emails were sent to a zine
librarians' listserve and to other contacts and friends of the Zine
Collection. I also sent a link to the exhibit to colleagues and friends in
K-12 schools across the country. Since the initial launch, Freedman
has continued to publicize the online exhibit in her talks and outreach
activities for the collection.
Results and Next Steps
Informal evaluation of the project demonstrates that it has begun to
meet its goals. User feedback on the site has been overwhelmingly
positive. By making Barnard's zines accessible on the web to millions
of people across the globe, the exhibit has indeed improved access to
the collection. In an age when discovery on the web is primarily done
through Google or other search (Belden, 2008), it is encouraging that
search terms such as "zine lesson plans," "zines and elections," and
"teaching with zines" consistently return the site in the first page of
DIY Digitization 19
search results. This is an indication of how many sites continue to link
to the exhibit since its launch.
It is obvious from the Access and Use Survey that such an effort
is both desired and respected by stakeholders. The exhibit only
scanned selections of the zines featured, not entire issues, it may not
be seen to contribute to the long-term preservation of the individual
artifacts. However, creating a home on the web for zines around a
contemporary issue can be seen to be contributing to the long-term
preservation of the genre by making zines relevant in the digital age.
It remains to be seen whether this site will indeed drive users to
the Zine Collection's other resources. Further evaluation should be
done in order to determine whether or not this exhibit is directly
contributing to increased access or use of the collection by
Barnard/Columbia community members, outside researchers, K-12
educators, and other stakeholders. Repeating the Access and Use
Survey annually may be a step in that direction.
Long-term sustainability and continued effectiveness of the
exhibit are an issue. Freedman or future interns will need to take on
the responsibility for maintaining and updating the exhibit as
necessary over time. As Barnard Library further develops its web
presence, the exhibit's look, feel, and even its content could become
outdated. At this time, however, there is no reason why the exhibit
cannot stay live for the foreseeable future without financial cost or
significant staff time commitment. In order to maximize the exhibit's
effectiveness, Barnard could consider:
Continuing to promote and publicize the current exhibit,
focusing on alternative outlets such as Wikipedia, educator
websites, and media literacy blogs
Developing an evaluation plan to determine the impact of the
online exhibit on the stated project goals.
• Securing site analytics data on page usage and download stats
to measure usage of the exhibit, as well as effectiveness of
marketing techniques
20 DIY Digitization
Due to the support for this project from users, Barnard could
consider digitizing more of the collection. Next steps could include:
Creating more online exhibits around themes or subjects of
interest to stakeholders if this first exhibit proves valuable
Creating more subject guides, lesson plans and bibliographies
about zines and zine history, and making them available online
Collaborating with Columbia's New Media Teaching and
Learning group in order to ensure that the user interface
promotes teaching and learning with zines as primary sources,
art objects and media literacy teach tools.
An open question is whether or not Barnard should move forward
with digitizing the entire collection. Although this project was able to
overcome many barriers to digitization of the genre (Wooten, 2009), a
larger-scale project might open the door to more difficulty with
copyright, permissions, privacy, and preserving the user experience of
interacting with a zine's physical form.
Conclusion
This project showed that it is possible to create innovative web
resources for a variety of stakeholders with a minimum level of
technological and know-how. It provided a great opportunity for
Barnard to continue to lead in the field of zine librarianship. This
online exhibit supports teaching, learning, and research with quality
and findable digital assets that highlight Barnard Library's strengths.
With the current low barrier to web authorship, it is not enough for
academic special collections to simply have a web page. Instead, more
libraries and institutions can take advantages of the resources
available to them - whether it is an eager intern, an exciting
collection, or a timely theme -to create a resource that will meet the
needs of users and make their holdings accessible to and available for
generations to come.
DIY Digitization 21
References
Astbury, K. (2006). French theatre of the first empire: Enhancing
research-based learning. Warwick Interactions Journal 28.
Retrieved from http://www2.warwick.ac.uk/services/cap/
resources / pubs/interactions / archive/issue28 / abastbury/ astbury
Belden, D. (2008). Harnessing social networks to connect with
audiences: If you build it, will they come 2.0? Internet Reference
Services Quarterly, 13(1), 99-111.
Bos, B. (2010). Cascading style sheets. W3C Consortium. Retrieved
from: http://www.w3.0rg/Style/CSS/
Congdon, K. G., & Blandy, D. (2003). Zinesters in the classroom:
Using zines to teach about postmodernism and the communication
of ideas. Art Education. 56(3), 44-52.
Daly, B. O. (2005). Taking whiteness personally: Learning to teach
testimonial reading and writing in the college literature classroom.
Pedagogy, 5(2), 213-246.
Dempsey, L. (2006). Libraries and the long tail: Some thoughts about
libraries in a network age. D-Lib Magazine, 12(4).
doi:io.i045/april20o6-dempsey
Dodge, C. (2008). Collecting the wretched refuse: Lifting a lamp to
zines, military newspapers and Wisconsonalia. Library Trends,
56(3), 667-677-
Duncombe, S. (1997) Notes from underground: Zines and the politics
of alternative culture. New York : Verso.
Jones, M. (2008). Elections and protest: Zines from the Barnard
Library collection. Barnard Library Website. Retrieved from
http : / / www.barnard. edu /library/ zines / exhibits/ online/ elections / i
ndex.html
Lesk, M. (2007). From data to wisdom: Humanities research and
online content. Academic Commons. Retrieved from http://
www.academiccommons.org/ commons/essay/ michael-lesk
Liu, S. (2008). Engaging users: The future of academic library web
sites. College & Research Libraries, 69(1), 6-27
22 DIY Digitization
Myers, B. (1998). A brief history of human computer interaction
technology. ACM interactions, 5(2), 44-54. Retrieved from:
http://www.cs.cmu.edu/~amulet/papers/uihistory.tr.HTML
Soderberg, J. (2002). Copyleft vs. copyright: A Marxist critique. First
Monday, 7(3). Retrieved from: http://ojphi.org/htbin/
cgiwrap/bin/ojs/index.php/fm/article/viewArticle/938/860
Wan, A. J. (1999). Not just for kids anymore: Using zines in the
classroom. Radical Teacher, 55, 15-19.
Wooten, K. (2009). Why we're not digitizing zines. Duke University
Libraries Digital Connections Blog. Retrieved from
http:/ /library.duke.edu/blogs/digital-
collections /2009/09/21/ why-were-not-digitizing-zines /
Wright, F. (1997) The history and characteristics of zines. TheZine &
E-Zine Resource Guide. Retrieved from
http : / / www. zinebook. com/resource/ wright 1 . HTML
Digitizing Civil Rights: An Omeka-based
Pilot Digital Presence for the Queens
College Civil Rights Archive
Valery Chen, Jing Si Feng, Kevin Schlottmann
(Queens College, CUNY)
Abstract
The Queens College Civil Rights Archive of the Department of Special
Collections partnered with the Queens College Graduate School of
Library and Information Studies to create a pilot web presentation
using the open-source Omeka platform. Phase I of the project,
conducted during the Spring 2010 semester, outlined a method for
institutions of limited means to enter into the world of digitization
using existing resources while highlighting the difficulties involved
with metadata and IT support, and the advantages of involving
graduate students.
Keywords: Civil rights, Digitization project, Omeka, Plug-ins.
Introduction
In the spring of 2010, the Queens College Civil Rights Archive of the
Department of Special Collections partnered with the Queens College
Graduate School of Library and Information Studies to create a pilot
web presentation using the open-source Omeka platform. The
synergy created between the technological skills found in the library
school faculty and student body and the desire of the Civil Rights
Archive to begin digitization of key holdings allowed the rapid
23
24 Digitizing Civil Rights
creation of a powerful web presentation platform. The process also
outlined a method for institutions of limited means to enter into the
world of digitization.
Queens College Civil Rights Archive
The Civil Rights Archive of the Queens College Department of Special
Collections and Archives collects published and unpublished works
relating to civil rights activities such as personal papers, community
materials, organizational records, non-print materials, and artifacts. It
also conducts oral histories to supplement its collections. The archive
is particularly strong in materials documenting civil rights work by
Queens College students during the early 1960s. The Archive seeks to
provide evidences of the under-documented Northern involvement in
the civil rights movement.
The Archive was founded in late 2008 around an estimable
collection of personal papers donated by alumnus Mark Levy. Since
then almost a dozen other personal collections relating to civil rights
work in the 1960s have been donated by College alumni, and the
Archive continues to actively collect in this area.
Queens College Graduate School of Library and
Information Studies
The Queens College Graduate School of Library and Information
Studies prepares library/information service professionals to meet the
information and literacy needs of the New York metropolitan region
and beyond. It is the only American Library Association accredited
program for library and information studies within the City University
of New York. The school prepares graduates to serve a broad segment
of the metropolitan area's multicultural, multiethnic and multi-lingual
population in a variety of institutional and informational settings.
Through research, publication and other forms of scholarly activity,
the school contributes and transmits new knowledge to society and
the profession. The faculty provides opportunities for students to
attain the competencies needed to participate in the evolving
Digitizing Civil Rights 25
electronic age by providing a technologically rich teaching/learning
environment.
Project Origin
The Queens College Department of Special Collections, wherein the
Civil Rights Archive is located, was acutely aware of the need for its
collections to have a digital presence. It will soon be true that archival
materials that are not electronically accessible in some way, whether
via an OPAC or on website, will be no better served than in a dark
archive. Given the limited resources of an urban public university, the
Department had been unable to secure sufficient financial and
technological support for an independent digitization project. Head of
Special Collections Dr. Ben Alexander is also teaching in the Graduate
School of Library and Information Studies, and he approached Dr.
Kwong Bor Ng to discuss a mutually beneficial way to begin the
process of building a digital presence. Drs. Ng and Alexander decided
to expand the Special Collections Fellowship program, which provides
archival graduate students at Queens College with a broad range of
professional archival experience, to include a technology component.
Dr. Ng selected two graduate students with extensive coding skills to
do the actual work of creating an Omeka presentation website. The
hope was to create a mutually beneficial arrangement: under Dr. Ng's
supervision, the two graduate students were able to gain real-world
experience in building an Omeka platform, while the Civil Rights
Archive was able to lay sufficient groundwork to seek grant funding in
support of a larger digitization project.
Staffing/Workflow
Drs. Alexander and Ng served as project coordinators. The semantic
team, which was co-extensive with the Department of Special
Collections staff, consisted of Dr. Alexander, Archives Adjunct Katie
Hughes and Archives Assistant Kevin Schlottmann. The technical
team was headed up by Dr. Ng, who supervised two of his technology
graduate students, Valery Chen and Jing Si Feng. They were given
academic credit as independent study students and Special Collections
26 Digitizing Civil Rights
Fellows to build the Omeka website. The semantic team was
responsible for selection, digitization, and metadata creation. The
technological team was responsible for the installation and
development of the Omeka presentation. Both teams were involved in
the development of the Dublin Core metadata schema, and they also
collaborated in creating the user experience of the website.
The project began with a meeting in February 2010. This meeting
was initially a brainstorming session, but the teams were able to agree
to the basic semester goal of a pilot website, as well as a rough
timeline. Once the semantic team completed selection and the
metadata schema was ready, a few items from the Civil Rights Archive
were digitized per week and forwarded to the technological team.
Meanwhile, the technological team was preparing the Omeka website
for import of digital items. This continued as an iterative process for
the entire Spring 2010 semester. At biweekly meetings, the website
and the digital items were discussed, and both were constantly
improved.
Implementation - Semantic Team
The semantic team, consisting of the Department of Special
Collections staff (Department Head Dr. Benjamin Alexander, Archives
Adjunct Katie Hughes and Archives Assistant Kevin Schlottmann) was
responsible for selection, digitization, and metadata creation.
Selection
When the semantic team began discussing what to digitize, it
considered materials from the Civil Rights Archive, the College
Archive, the Performing Arts collection, and the Rare Book, Zine, and
Artists Book collections. It quickly became clear that the civil rights
materials were best suited for this pilot project, for reasons such as
processing status, donor relations, fitting into the Archive's specific
mission of engaging with the broader community, copyright status of
the materials, and the attraction of having students continue to work
with the material.
Digitizing Civil Rights 27
First and foremost, the Civil Rights Archive had the best-
cataloged collections in the Department. The majority of the Archive's
holdings are fully processed, have finding aids, and are under archival
control. This level of processing also allowed for existing contextual
information to be used by the teams.
The civil rights materials are among the most prized and high-
profile holdings in Special Collections. The recent founding of the
Archive garnered attention from campus and activist communities,
and received local press coverage as well. From a donor relations
viewpoint, digitizing these collections would generate goodwill within
the alumni activist community that contains future donors and
supporters. The Archive's living donors are still very interested in
their materials and legacies, and have repeatedly expressed strong
interest having their materials digitized.
The semantic team also conducted a review of existing civil rights
digital archives, which showed many worthy efforts already underway.
The University of Southern Mississippi, which holds one of the largest
archival collections about civil rights work in Mississippi, created the
Civil Rights in Mississippi Digital Archive, an "Internet-accessible,
fully searchable database of digitized versions of rare and unique
library and archival resources on race relations in Mississippi"
(University of Southern Mississippi Special Collections, 2006a). The
Civil Rights Digital Library, hosted by the University of Georgia, is "a
partnership among librarians, technologists, archivists, educators,
scholars, academic publishers, and public broadcasters" that provides
federated searching of digital civil rights materials from almost 100
different institutions (Digital Library of Georgia, 2009).
These two excellent examples, among dozens of others, illustrated
two major reasons why the semantic team chose to digitize materials
from the Civil Rights Archive, rather than from other special
collections. First, it was found that there exists a vigorous online
community that any institution holding archival civil rights materials
must join to remain relevant and accessible. Second, the team found a
paucity of material that relates specifically to Northern contributions
to the civil rights movement, and thus digitization of the Queens
28 Digitizing Civil Rights
College Civil Rights Archive would add new perspectives to the online
community, in keeping with the Archive's stated mission of engaging
the broader archival civil rights community.
The copyright status of the many of the materials in the Civil
Rights Archive was clear, because photographs and personal papers of
known provenance could be easily cleared by the creators with whom
the Department has a relationship. Many of the other areas in special
collections have a murkier copyright status, which is a major potential
impediment to digitization.
Finally, the civil rights materials were primarily processed by
archival students from the Queens College Graduate School of Library
and Information Studies as part of the Special Collections Fellowship
program. It seemed natural to continue the collaborative effort
between the library school and Special Collections by having the
student-processed materials brought into the digital realm by Fellows
as well.
Digitization
The actual scanning of the items was not the focus of this pilot
project. A proper scanning procedure that will create archival images
in the TIFF format, such as that developed by the University of
Southern Mississippi (University of Southern Mississippi Special
Collections, 2006b), will be developed during Phase II in conjunction
with implementation of a digital asset management system. The items
digitized for this project were scanned with an Epson 10000XL
scanner using Adobe Photoshop CS3. After cropping, deskewing, and
adjustments to contrast and level, the images were saved as 300-dpi
Web-optimized jpegs and provided to the technical team.
Metadata
The development of a robust metadata schema was a primary goal
of this project. The semantic team examined a range of available
schemas, such as METS, MODS, and PREMIS, but it very quickly
became clear that Dublin Core (DC) was ideal for a variety of reasons.
Unqualified DC is compatible with basic Omeka; it is a simple and
easily-understood schema; it can be extended by using qualified DC; it
Digitizing Civil Rights 29
is well-established; and similar projects are using it. The latter was a
particular influence, both because the Civil Rights Archive hopes to
digitally collaborate with other institutions as well as because there
are a wealth of relevant resources available. The two key sources used
by the semantic team were published DC schemas from the University
of Southern Mississippi and the North Carolina Exploring Cultural
Heritage Online project (Graham & Ross, 2003; NC ECHO, 2007).
The semantic team created a detailed qualified Dublin Core
schema, but after much discussion the teams decided to work with an
unqualified schema, because that was the Omeka default. The
semantic team continued to create qualified DC, so that in the future
the project will be able to implement this more detailed schema.
The project utilized many controlled vocabularies. Library of
Congress Name and Subject Authorities were used for person and
subject terms; DCMI and IAMA for type and digital format,
respectively; and ISO 8601 for dates. For the analog Medium field,
the Getty's Art and Architecture Thesaurus proved most useful.
Geographic data were placed in the Coverage field. LC subjects were
used for the general geographic area, such as the town or state, while
latitude and longitude data were taken from Google Maps by manually
entering a known address and harvesting the geospatial data
provided. The technology team was able to use the latitude and
longitude data to create a Google Map reflecting the geographic
location of the digitized items.
The teams both felt it important to offer maximum searchability,
and the semantic team thus also provided the full text of digitized
items, using OCR software. This proved to be a time-consuming
additional step, in particular the proofreading of the computer-
generated text. The print quality of some materials was quite poor,
and the many drawings and photographs were also difficult for the
software to interpret. Better software and more experience creating
OCR text should make this easier as the project moves forward.
Omeka also has the ability for users to add tags. The teams
decided to take selected controlled-subject values from the Dublin
Core metadata and use them as tags as well, to allow testing of
30 Digitizing Civil Rights
features such as the tag cloud. The teams engaged in an interesting
discussion about how tags would be used in this project - some
wanted to keep the vocabulary controlled, while others wanted to
encourage users to add tags as they saw fit. The teams decided to
allow tags to be used as user-generated metadata, in keeping with the
Web 2.0 spirit of Omeka.
The metadata was created manually by the semantic team in an
MS Word table, and transferred into Omeka by the technology team.
A future goal is an automated process for metadata transfer.
Implementation - Technology Team
The technology team for the project consisted of two independent
study graduate students, Valery Chen and Jing Si Feng, as part of the
Special Collections Fellows program, and their instructor and project
coordinator Dr. Ng. These Fellows began working on the project after
the first meeting with the semantic team on February 16, 2010. Over
the next ten weeks, the Fellows downloaded, installed, and modified
Omeka, an open-source web-publishing system, and completed the
first phase of the project on May 13, 2010.
Why Omeka?
The purpose of this project was to create a web presentation to
showcase the unique and valuable holdings of the Queens College Civil
Rights Archive, and at the same time provide the Fellows an
opportunity to learn how to build a digital archive using a web-
publishing system. In any project, it is important to consider the use
of proprietary system versus nonproprietary/open-source. Omeka is
relatively a new software package that describes itself as a web-
publishing platform on its website:
Omeka is a free, flexible, and open source web-publishing
platform for the display of library, museum, archives, and scholarly
collections and exhibitions. Its "five-minute setup" makes launching
an online exhibition as easy as launching a blog. Omeka is designed
with non-IT specialists in mind, allowing users to focus on content
and interpretation rather than programming. It brings Web 2.0
Digitizing Civil Rights 31
technologies and approaches to academic and cultural websites to
foster user interaction and participation. It makes top-shelf design
easy with a simple and flexible templating system. Its robust open-
source developer and user communities underwrite Omeka's stability
and sustainability. (Omeka, 2010, Project section.)
Omeka is an open-source web-publishing system developed by
the Center for History and New Media at George Mason University.
According to the Omeka web site (Center for History and New Media,
George Mason University, 2010) Omeka is easy to install, allows great
flexibility for customized web interface, and supports multiple
plugins. All these features were appealing for this project.
Another compelling reason to choose Omeka was the potential
inherent in the exhibit feature. Omeka has the Web 2.0 ability of
allowing users to create their own exhibits from the digital collections.
A primary goal of the Civil Rights Archive is to engage the educational
community and encourage use of its materials. By providing digital
surrogates and contextual information, this website would allow a
teacher or professor to tailor their use of the materials in an exhibit,
and also make them accessible to other educators seeking similar
uses. This type of educational contextualization was a key reason to
digitize the collection.
Description of Phase I (Feb 16, 2010 - May 13, 2010)
This section discusses the installation of Omeka, the addition of
various plugins, the details of the most heavily manipulated pages,
and examines particular technical problems and solutions
encountered during Phase I of this project.
Installation
Initially, separate Omeka instances were created for each Fellow
on the technology team to experiment independently. After both
instances were adequately developed, the best features were selected
from each and transferred over to a new Omeka installation, running
on version 1.2.
Omeka l.i, the latest version available at the time of the first
installation process, was downloaded by the technology team and
32 Digitizing Civil Rights
installed. The Omeka system was in a zip file, and the technology
team had to unzip the file to extract all the necessary files for
installation. Each Fellow downloaded the zip file and unzipped the
files successfully. The next step was to connect to the server remotely
and upload the files for installation. Omeka consists of thousands of
files, and Adobe Dreamweaver could not handle such a massive
upload. Using an FTP client, such as FireFTP, was found to be the
best practice. FireFTP supports large uploads and does not terminate
in the middle of an upload. If termination does occur, FireFTP
automatically reconnects to the server to continue with the upload.
The Omeka installation folder in the directory was removed by
bash shell script for security. Administrator and Super accounts were
created, and the system was up and running.
Creating Items, Tags, and Collections
A collection can be created in the Omeka Admin page by filling in
the name and description of a collection. An item can be created by
filling in the Dublin Core fields in the Omeka Admin page and adding
the item to a collection. Tags can be added to each item to create
more access points.
All image values in the Omeka General Setting should be defined
before importing any image files. The values Fullsize Image Size,
Thumbnail Size, and Square Thumbnail Size are crucial for Omeka to
generate image output. Omeka automatically generates full size image
and thumbnails during item creation. The technology team decided to
change the full size image output and thumbnail size in the middle of
the project, thus resulting in two different image sizes throughout the
site. This meant that all the files needed to be uploaded again at the
conclusion of the project, to ensure uniform image sizes.
Selecting a Theme
The public interface of Omeka was controlled by the files inside
the "themes" folder. The Super can log on to the Omeka Admin page
to choose a desired theme. More themes can be downloaded from the
Omeka website and uploaded to the server. Both Fellows selected a
different theme for their individual pilot Omeka sites. One chose
"santa-fe" while the other chose "spring."
Digitizing Civil Rights 33
Plugins
Many plugins are available for download from the Omeka
website. The technology team installed Geolocation, Simple Pages,
Dublin Core Extended, Dropbox, ExhibitBuilder, and Lightbox for the
pilot site. Several outside interactive effects were also installed,
including animated collapsible panels, text truncation, and a
slideshow. Most of the plugins were easy to use and install without
any hassles; the plugins specifically named above are discussed in
more detail below. One plugin issue was that some of the plugins
were written in plain JavaScript while others were written using
jQuery. This often created clashes in the code as the dollar sign
symbol ($) was used for different purposes in both JavaScript and
jQuery. In JavaScript, the $ indicates a variable, while in jQuery the
$ represents the start of a command. Since jQuery is technically a
JavaScript library, the double meaning of the $ was problematic.
Geolocation
Geolocation uses features from Google Maps and allows users to
view items in various geographic locations by clicking on the balloon
pointers. The technology team decided to install this plugin because it
gives users a visual, spatial representation of the materials in the
archive. This visual aid can be useful for quickly identifying locations
of particular interest, and it allows users an additional non-textual
method of accessing digitized items.
In order to use the Google Map functionality, the technology team
first had to obtain a Google Maps API (application programming
interface) key by creating an account with Google. Once the key was
obtained, the team downloaded the packaged Google Map plugin from
Omeka and uploaded it to the server. The key was then supplied
through the administrator interface to activate the plugin. From the
administrator interface, it is possible to customize the Google Map
API. For example, the administrator can set the default location,
which is controlled by latitude and longitude, and the zoom level
ranging from 1 to 20, where 1 displays a view of the Earth and 20
displays a street level map.
34 Digitizing Civil Rights
The geographic location was added manually through the
administrator interface. The latitude and longitude were entered for
each digital item's metadata. The technology team hopes to create a
plugin in the future that will automatically locate the spatial location
information from the Dublin Core metadata and supply it accordingly
so that as items are uploaded they will automatically appear on the
map.
All items with geographical coordinates are represented by the
default red pinWhen users click on the marker, the corresponding
item will pop up in a balloon with limited metadata identifying the
item. Users can then click on the balloon, which brings them to the
particular item's display page. The team plans to enhance the map by
creating a color-coding scheme that will correspond to an item's
categorization. For example, items relating to a protest could have a
red marker, while Freedom School items could have a blue marker.
This functionality will add additional visual meaning to the map.
Simple Pages
Simple Pages provides the ability to handle html codes input by
the Super at the admin page. The team used the Simple Page plugin
to create the "About Us" section of the website.
Dublin Core Extended
Omeka defaults to Unqualified Dublin Core. A Dublin Core
Extended Plugin can be installed to create more descriptive fields. A
primary consideration when installing Dublin Core Extended is
whether all the additional fields are necessary for a given project. Too
many fields slow down the data entry process and scrolling time, and
since all the Omeka data entries are web-based, a wrong click by the
mouse can result in loss of data. The teams hope to take advantage of
qualified Dublin Core in Phase II.
Dropbox
Omeka uses a web-based data entry system; each field needs to be
typed in separately. A Dropbox plugin can be installed to facilitate
bulk file uploads, allowing multiple files to be uploaded at the same
time into a Dropbox Folder. When adding a digital item to the
Digitizing Civil Rights 35
collection, the user can then select that item from the Dropbox.
Dropbox also supports the creation of multiple items, and allows bulk
creation of Tags, Collections, and Titles. Other Dublin Core fields still
need to be entered separately.
ExhibitBuilder
ExhibitBuilder was a more complex installation, as the
architecture of the exhibit pages had to first be determined. The
structure of the exhibit was defined as a Main Title Page with
descriptions, a Sub-Section with descriptions, and individual pages
with items in the layout of choice. The interface of an exhibit can be
changed by accessing the screen, ess file under the ExhibitBuilder
directory.
Lightbox
In order to enhance the user experience, a Lightbox was installed
for viewing full size images. The Lightbox feature creates an overlay
around an image and enhances the clarity of the image in the center.
The latest version of Lightbox was downloaded from
http://www.huddletogether.com/projects/lightbox2/, and installation
process was closely followed from the Omeka Documentation page
(Ebellempire, 2009).
Heavily Manipulated files
This section describes two particular areas of the Omeka platform
that were heavily edited by the technology team.
Item Display Page
The default display page for each item is a non-stylized,
exhaustive list of metadata that requires extensive vertical scrolling.
Moreover, some of the DC and Omeka metadata are duplicative. In
order to create a more user-friendly display of information, multiple
files were heavily manipulated and outside plugins were incorporated
to make the display page more interactive.
The technology team found it challenging to determine which files
control the item display page, because there are many files with the
same name in different directories. However, once the structure of
36 Digitizing Civil Rights
how Omeka arranges its files was ascertained, the team was able to
manipulate the code and customize the display.
The first step after locating the correct file was to understand the
predefined functions created by Omeka. This was achieved with the
help from the documentation on the Omeka website. This stage was
time consuming, as the team used a trial-and-error approach to learn
how to use the different available functions. Once it was determined
which function handled the selection of individual metadata, the team
was able to control where each piece of metadata would be displayed.
The display page was designed so that the most important
information was provided in a clear, easy-to-read format. To
minimize the scrolling, certain technical metadata was hidden from
the user. The technology and semantic teams jointly examined each
DC and Omeka data field, and decided on a short of list of fields to
display. These included the collection, subject, tag, and rights fields.
Creator, date, location and citation were included under the item's
thumbnail image. Hidden fields included type, source, publisher,
upload date, and extent. These fields were merely hidden - users may
click on the "show" link to reveal the hidden metadata if desired. The
show-and-hide functionality was made possible by installing jQuery, a
lightweight JavaScript library, and the appropriate toggle-display
code. The teams agreed that while an average user would not need to
see this information on first viewing a page, more sophisticated users
might want the option. The semantic team in particular found in its
review of similar web projects that many sites did not include
technical metadata, to the disappointment of archivists and librarians
seeking to learn from how other institutions handle metadata.
If a particular item had an accompanying full text, it was made
available to the user. However, since a typical full text would cover an
extensive area of the display page, the technology team used a
truncation function to display a snippet of the full text along with a
link to show more. This was also accomplished through jQuery.
While most of the display page reorganization was controlled by
the php files, CSS files controlled the final style touches, such as
spacing, font sizes, and background color.
Digitizing Civil Rights 37
Homepage
The default homepage also underwent a major redesign. Instead
of the default static thumbnails representing the featured item, the
team added a dynamic slideshow with captions that rotate through the
featured items. The initial plan was to implement the slideshow using
Adobe Flash. However, that idea was discarded as inefficient because
new .swf files would have to be created for each new imaged added
and each image taken down. Therefore, the best solution with the
least amount of maintenance was to find a method to automatically
pull images for display. The team achieved this by installing a jQuery
slideshow. These images are thus easily maintained by a few lines of
code.
The technology team also added a right sidebar that allowed users
to browse by various categories, including tag, date, location, subject,
and type. The sidebar was initially occupied by a list of recently
accessed items, but in conversation with the semantic team it was
decided that creating multiple access points would be more useful to
the user for navigation purposes. Currently, the list of categories is
manually created through simple HTML code. The technology team
hopes to automate the generation of categories in this sidebar, to
simplify site maintenance.
Goals and Discussion
Phase II Goals
The teams have many goals for Phase II. The stabilization and
scalability of the project are a top priority, and this includes
development of a scanning process and a streamlined metadata
creation process, as well as finding stable server space and
implementing a digital asset management system behind the Omeka
website.
In terms of the Omeka, the technology team hopes to add more
sophisticated features to the map, such as item categorization. On the
metadata side, it will implement qualified Dublin Core, and may
38 Digitizing Civil Rights
attempt to install the OAI-PMH Harvester. Finally, a long-term goal
is the creation of a plugin that would allow use of TEL
Discussion
This pilot project identified issues that might be similar in other
institutions with limited resources. The three key lessons from Phase
I of this project relate to technical support, metadata, and the
involvement of graduate students.
Technical Support
Proper server space is absolutely vital for any digitization project.
The extremely limited IT resources available at Queens College mean
that this pilot project is limited in its ability to grow beyond Phase I.
This also impacts the Department of Special Collection's ability to
implement a digital asset management system. If continued efforts to
obtain College server space are unsuccessful, the project will examine
other options such as approaching another CUNY school for a
partnership, or perhaps even renting commercial server space.
Metadata
The semantic team spent the majority of its time and effort on
metadata, both in developing the schema and creating the actual
records.
Creating metadata from scratch is extremely time-consuming.
For each individual image, the team created titles and descriptions,
applied a variety of controlled vocabularies, harvested geographic
data, and proofread OCR text. With the current human resources, this
process would be impossible to scale up. Thus, in Phase II the teams
will attempt to streamline the process by automating some aspects of
metadata creation. For example, technical specifications could be
automatically imported and drop-down menus created for oft-
repeated controlled vocabulary terms. Once a robust and well-
described process is in place, Special Collections Fellows working on
civil rights materials could be recruited to enter much of the metadata,
which would provide the administrative team time to engage in other
tasks, such as selection, quality control, and project planning.
Digitizing Civil Rights 39
Another issue the teams discussed was the extent to which
existing archival description is helpful. As archival description by
definition is not item-level, existing finding aids were only of limited
use this pilot project, primarily providing contextual information. In
the future however, incorporation of EAD finding aids would certainly
allow additional types of categorization and searching.
Bulk Metadata Creation in Omeka
In order to facilitate the slow process of metadata entry for each
item in Omeka, the technology team is experimenting with CSV
Import in Phase II of the project. CSV Import, a plugin that handles
bulk metadata creation, allows each Dublin Core field to be imported
via a comma-separated file. However, more technical support is
needed to define the PHP-CLO path setting with the server
administrator.
Graduate Students
The collaboration of the Department of Special Collections with
the Graduate School of Library and Information Studies was beneficial
to both parties. Special Collections was able to lay the groundwork for
future digitization and also to create a prototype Omeka website to
show potential donors and partners, while the graduate Fellows
gained valuable skills in actual implementation of a sophisticated
software package. It should be noted that such a collaboration
requires continuous interaction between and dedication from both the
library school faculty and Department of Special Collections, to
maintain standards and momentum as different classes of Fellows
participate in the semantic and technology sides of the project.
References
Center for History and New Media, George Mason University. (2010).
Omeka: about. Retrieved from http://omeka.org/about/
Ebellempire. (2009). Adding Lightbox to Omeka. Retrieved from
http:/ / omeka.org/ codex/Adding_LightBox_to_Omeka
Digital Library of Georgia. (2009). Welcome to the Civil Rights Digital
Library. Retrieved from http://crdl.usg.edu/7Welcome
40 Digitizing Civil Rights
Graham, S. R. & Ross, D. D. (2003). Metadata and authority control in
the Civil Rights in Mississippi Digital Archive. Journal of Internet
Cataloging 6(10), 33-42.
NC ECHO. (2007). North Carolina Dublin Core implementation
guidelines. Retrieved from
http : / / www. ncecho .org/dig/ncdc2007. shtml
University of Southern Mississippi Special Collections. (2006). About
the Civil Rights in Mississippi Digital Archive. Retrieved from
http://www.lib.usm.edu/~spcol/crda/about.htm
University of Southern Mississippi Special Collections. (2006).
Guidelines for digitization. Retrieved from
http://www.lib.usm.edu/~spcol/crda/guidelines/index.html
Digitization on a Dime: How a Small
Library and a Big Team of Volunteers
Digitized 15,000 Obituaries in Just Over
a Year
Elizabeth Goldman (Kingston Frontenac Public Library)
Abstract
In 2006 and 2007, Chelsea District Library, a small public library in
Chelsea, Michigan, digitized a collection of 15,000 obituaries on a
small budget by using a staff of nearly 50 volunteers and open source
software. The author describes the research and planning that led up
to the project; unique aspects of the staffing and technology for the
project; and the resulting database, which contributed to the library
being named "Best Small Library in America" for 2008 by Library
Journal. The database continues to be updated, expanded, and
improved, and the use of volunteers as the primary workforce has had
long term rewards for the library. The chapter demonstrates the
accessibility of digitization projects to libraries, even those without
pre-existing expertise, large staffs, or big budgets.
Keywords: Database, Digitization, Genealogy, Obituaries, Open
source, Volunteers.
Introduction and background
Especially in small towns, the public library may serve multiple roles
as library, museum, archives, and community center. In many cases,
libraries accept donations of rare or unique historical material from
41
42 Digitization on a Dime
families in the area with little thought for long-term maintenance.
Technological advances in the last few decades have given libraries
new options for preserving local history collections and making them
more accessible through digitization. While even the smallest libraries
have staff educated in reference and circulation procedures, however,
relevant training in archival methods and technology is rare, making
the prospect of a digitization project overwhelming. At the same time,
small public libraries often have little or no funding for such projects.
Chelsea District Library is a single-branch public library serving
14,000 people in southeast Michigan, about 50 miles west of Detroit.
The library was established in 1932 by the local Women's Club and, in
1999, became a district library system serving both the town of
Chelsea (population 5,000) and surrounding townships. The earliest
settlements in the Chelsea area date back to the 1830s and many local
families can trace their roots back to the town's founders, resulting in
a rich history. From its earliest days, the library served as a repository
for local historical and genealogical material, housing a local history
room on the cramped third floor of the McKune House, its location
from 1959-2000. In 2006, after extensive renovations and additions
to the McKune House, the library moved from temporary quarters
back to its historic home on Main Street in Chelsea, leading to
renewed interest in the local history collection.
A collection of about 50,000 index cards known as the Family
History Index made up a major component of the local history
material. A retired lawyer and amateur genealogist named Harold
Jones started the collection as a hobby, clipping obituaries from the
local Chelsea Standard newspaper and other sources and pasting them
onto 4x6 index cards, along with cross references that allow women
to be located by maiden name. Upon his death in 1987, Jones' family
donated the collection to the library, where it received extensive use
by local and visiting genealogists. Library staff and volunteers
completed a project from 2000-2002 to clean up and update the
collection, since then volunteers have continued adding new clippings.
As part of a planning process leading up to an election to fund an
expansion of the building and the staff, the library identified local
Digitization on a Dime 43
history as a priority for Chelsea area taxpayers and the Family History
Index as a prime candidate for digitization work. At the time, the
professional librarian staff consisted of the director, three department
heads (adult services, youth services, technology services), and a part-
time librarian, leaving few resources to focus on a project of this
scope. In the spring of 2005, the library replaced the departing part-
time librarian with a full-time librarian (the author), adding
additional duties of managing the library's website and digitizing the
Family History Index.
This paper will describe how a librarian and a team of four dozen
volunteers completed the digitization of 15,000 records from the
Family History Index in just over a year, resulting in a highly usable
database that helped Chelsea District Library earn its distinction of
"Best Small Library in America" for 2008 from Library Journal and
the Bill and Melinda Gates Foundation. The first section will describe
preparation for the project, including research in archival standards
and digitization techniques, as well as the recruitment and training of
the volunteer workforce. Section two will describe the decision-
making process that went into the choice of open source software in
order to create a user-friendly, free database of the records on a
limited budget, as well as the work of developing and testing the
database itself. Section three will offer an overview of the workflow for
staff and volunteers as they did data entry, scanning, and
proofreading of the records. Finally, the paper will describe the
resulting database, current upkeep and expansion, and how the
project served as a model for additional digitization work.
Project preparation
Research
Initial research focused on archival and digitization standards. No
one on the staff had a specific background in either archives or
genealogy that would fit the requirements of the project, so research
started from a very basic level. Research was conducted largely online
and in books, as well as by speaking with archivists and libraries that
had done similar projects. Internal research played a role, too:
44 Digitization on a Dime
understanding how the Family History Index had historically been
used and its importance to the community; exploring the scope, size,
and fragility of the collection; and agreeing on project goals.
At the time, in mid-2005, quite a few libraries had launched
efforts to put obituary indexes online for use by genealogists and
historical researchers. The vast majority of these projects resulted in
static webpages or simple databases that provided access to citations
but not complete text. This early wave of digitization projects
represented an important first step on the path toward full electronic
access by giving researchers more complete information about the
contents of a library's collection. However, those who found an
obituary citation online still had to contact the library to obtain more
detailed information or the complete text of the obituary.
Chelsea District Library was lucky enough to have a collection
that included full-text clippings of obituaries on a significant portion
of the cards in the Family History Index. Because of this and advances
in technology, one of the library's primary goals was to make complete
text available freely online, meaning genealogists and researchers had
at least the possibility of meeting their research needs without leaving
their computer or contacting the library at all. To further this goal, the
library sought to develop a database that was free, simple to use, and
contained relatively small image files accessible even by those with
dial-up internet connections.
Important sources, although they sadly have not been updated in
recent years, were the book and accompanying website, "Moving
Theory into Practice," and the Making of Modern Michigan project.
Moving Theory into Practice: Digital Imaging for Libraries and
Archives by Anne R. Kenney and Oya Y. Rieger (Mountain View, CA:
Research Libraries Group, 2000) provided a good grounding in what
to consider when planning a digitization project, as well as minimum
standards for use and long-term preservation. An online tutorial of
housed at Cornell University Library (2010) offered step-by-step
instructions for taking a project successfully through to completion.
The Making of Modern Michigan was an IMLS-funded joint effort by
the Michigan State University Library, the Library of Michigan, the
Digitization on a Dime 45
Michigan Library Consortium, and others (Michigan State University
Libraries, 2005). Although its heyday had already passed by the time
Chelsea embarked project, the website provided helpful background
information on the structure of digitization projects and, especially,
scanning equipment that had been vetted. The scanner models in the
Making of Modern Michigan are no longer produced and libraries may
have moved to higher minimum DPIs for scanning, but the advice
offered at both sites remains sound.
Two more recent resources, available freely online, are BCR's
CDP Digital Imaging Best Practices, from the Bibliographical Center
for Research in Colorado (Collaborative Digitization Program, 2008)
and NISO's IMLS-funded A Framework of Guidance for Building
Good Digital Collections (National Information Standards
Organization, 2007), which within the framework includes links to
many other resources on more specific topics.
Project scope
Armed with this information, Chelsea District Library took a look
at the Family History Index. While the cards had been stored away
from the light in filing drawers, they also had been used heavily by
patrons over the years, resulting in some wear and tear. In addition,
neither the index cards nor the glue were acid free, and newsprint is
one of the most acidic papers. The oldest cards and the oldest
clippings dated back to the 1950s, resulting in some that were in very
delicate condition. Along with the unique and irreplaceable nature of
the collection, this delicate condition led the library director to decide
that the cards should be scanned manually in-house rather than sent
to a contractor who would likely feed them into a scanner, possibly
resulting in damage.
This decision meant significant labor for library staff members
and volunteers. In May 2005, the library applied for a grant from the
State of Michigan to fund the digitization, which would have allowed
for the hiring of contractors, but the application was turned down.
Luckily, the library already had a well-established volunteer services
program. Lacking any funding beyond $5,000 committed by the
library, the project manager developed a volunteer-driven plan and
46 Digitization on a Dime
made the choice to seek an open source solution for the database,
resulting in savings on the equipment side. An RFP was opened to
competitive bidders in the fall of 2005, with selection of a contractor
and design and testing of the database completed by early 2006.
During this time, the library also made decisions about the scope
of the project. While the collection itself was estimated to consist of
more than 50,000 cards, closer to 25,000 obituaries were
represented, due to Jones' system of cross-referencing women by
maiden name. A database eliminated the need for this. The remaining
set of obituaries came from a number of sources:
• gravestone transcriptions from Chelsea's three cemeteries and
several others in neighboring areas
• notes culled from early histories of the area, as well as
scrapbooks and other material in the local history collection
• notes from death notices published in the local newspaper, The
Chelsea Standard, taken from microfilmed versions of the
paper, dating from about 1887-1950
• complete obituaries from The Chelsea Standard, clipped and
pasted on cards, dating from about 1950 to the present
• complete obituaries from the newspapers in two nearby cities,
Ann Arbor and Jackson, clipped and pasted on cards, dating in
the 1970s and 80s.
Of this material, the first three could be included in the project
without further consideration of copyright, as the donation of the
Family History Index to the library included rights to copying the
material. The Chelsea Standard, a weekly publication owned by
Heritage Newspapers, supported the project from the beginning,
granting copyright release for material originally printed in its pages.
Unfortunately, the publisher of both the Ann Arbor News and
Jackson Citizen-Patriot declined to grant copyright permission. The
library considered this a minor setback, as obituaries from those two
newspapers covered a span of only about 20 years. The impetus for
collecting from newspapers in the neighboring cities was that some
Chelsea area residents chose to publish obituaries only in these
publications. The number of citizens who fit this scenario, however,
Digitization on a Dime 47
was greatly outweighed by the number of obituaries of people who had
no connection to the library's primary service area at all. The library
considered entering data from these obituaries but refraining from
scanning them, which would not have violated copyright law, but
chose instead to leave the cards for a potential future project and focus
instead on truly local residents.
Staffing
At this project's initiation, Chelsea District Library had about
twenty employees. The project could not be completed by paid staff,
and the small budget precluded the hiring of contract labor.
The library had a well-established and strong volunteer program,
including a tradition of volunteers working with the Family History
Index. Started as a volunteer effort in 1932, the library had always had
strong support from volunteers, and genealogy in general is a topic
that draws volunteers. Nevertheless, bringing in volunteers double the
size of the library's own staff would not have been possible without a
coordinator, who had developed procedures, documentation, and
processes for intake, training, and evaluation.
Building on this strong foundation, the library advertised through
its newsletter, the local newspaper, word-of-mouth, and presentations
to organizations such as the county genealogical society. The Family
History Index was a well-used collection and one of the best sources
for obituaries for the area, so the library was able to to draw in
volunteers who did not live in the Chelsea district itself. Volunteers
filed standard application forms, which covered basic contact
information, times available for work, and special skills. The initial
group of about three dozen volunteers received training at one of two
sessions set up in early February 2006, after which additional
volunteers received training one-on-one or in small groups as they
signed on. Later, existing volunteers would train new recruits. Over
the course of the project, nearly 50 volunteers contributed to various
aspects of the project.
48 Digitization on a Dime
Database development
In considering technology options the library's predominant
constraints related to both funding and expertise. In 2005, Chelsea
District Library contracted most of its technology services to the
library cooperative of which it was part, including website hosting.
The library itself at the time had only six public computers and about
a dozen non-networked staff computers, with the single on-site server
running the public computer time management system. No one on
staff had knowledge of server administration, leaving staff nervous
about hosting a server for the digitization project but also open to any
of a number of configurations.
Open source software has seen increasing adoption by public
libraries in recent years. While often referred to as "free," open source
software is monetarily free only in the sense that to obtain a copy of
the code requires no exchange of funds. In the truer sense, "free"
refers to the user's freedom to view and adapt the software, generally
with an agreement to then share improvements with the larger
community. Chelsea District Library initially considered both
proprietary and open source options for this project, as well as both
in-house and contract solutions. While the librarian hired as project
manager had extensive experience with Microsoft Access, the
licensing costs for the accompanying Microsoft SQL server allowing
multiple simultaneous users made that option prohibitive. An open
source database based on PHP and MySQL appeared to be a more
realistic option, with the drawback that no one on staff had the
requisite familiarity with these programming languages. The library
decided to solicit requests for proposals to get a better sense of its
options. Replies to the RFP highlighted the range of options: from a
$40,000 proposal that involved proprietary software and taking cards
offsite for more efficient scanning to a $1,500 proposal based on open
source software and leaving data entry and scanning purely up to the
library. The library selected the able services of a programmer who
worked at a nearby library and who recommended the purchase of a
server and quickly designed and built a PhP/MySQL database meeting
specifications. While open source was not the initial goal, that such
Digitization on a Dime 49
software ended up forming the basis for the database contributed
greatly to the library's ability to produce a high-functioning, easy-to-
use database on a limited budget.
Once the library selected the underlying software, the real work of
database design began. Priorities were a simple interface on both the
administrative and public ends; completely web-based access for both
data entry and retrieval; and the ability to attach multiple images to
each record. The library pictured a database that would be simple and
fast for access by users all over the world and one that could be
expanded to meet larger goals for the local history collection. Starting
with attached obituary images, the library envisioned eventually
allowing researchers to submit their own photos, family trees,
marriage licenses, or other material that could supplement the
library's own collection to tell the broader story of each person
represented in the database. Flexibility for growth of both the size and
scope of the collection was important.
These ambitious plans remained in the future. In the near term,
the library had to balance providing extensive access to the obituaries
with completing the work within a relatively brief time frame. The
poor quality of the newsprint and cards meant doing optical character
recognition (OCR) was not realistic. Thus, while researchers would be
able to view an image of the complete obituary, searching would be
limited to data entered by volunteers. This meant maximizing the
number of access points was ideal; at the same time, too many access
points could slow work to a crawl. In the end, after consultation with
genealogical researchers, the library chose the following fields:
• first, middle, last, and maiden names of subject in separate
fields
• first and last names in one field for: mother, father, spouse(s),
children
• date and place (city, state) of birth and death
• cemetery and funeral home
• metadata covering obituary source and date and source of
digital record
50 Digitization on a Dime
The database allowed for multiple spouses and children. It also
included a notes field which, at the early stages, was left blank but
proved to be invaluable for later expansion.
The database would be accessed via the library's website by users
through either a basic or advanced search. Upon visiting the Family
History Index Online, users see a basic search screen for the name
field, which searches all name-related fields. This is often enough to
get users to the obituary they need.
Users also have access to an advanced search screen, which
searches first, last, and maiden name as separate fields. It also allows
access via date and place of birth or death, cemetery, and funeral
home, making it useful to those who may not be searching for a
specific person but for more general historical information. Data
typed into any of the fields on the basic or advanced search screens
will also search the notes field, which may contain additional data
from various sources.
The designer brought another feature to the database that would
set the Family History Index Online apart from similar projects at
other libraries, a hyperlinking feature that makes Chelsea's project
unique in allowing researchers to jump from record to record,
following the obituaries of family members represented in the
database. If the parent, child, or spouse of an obituary subject is
represented in the database, his or her name will appear as an active
link. Clicking on this link takes the user to that person's obituary
record. In this way, researchers may discover family connections they
did not know existed and be better able to visualize how families relate
to each other. Rather than noting the names of relatives, backing out
to the initial search screen, and starting a new search, database users
can simply hop from one relevant record to the next.
Once the database design was complete, both staff and an initial
group of volunteers participated in testing. Volunteers tested for ease
of use from the administrative and public perspectives as well as for
how the database would meet the needs of genealogy researchers. The
library was lucky to have a number of experienced genealogical
researchers among its volunteer corps. They provided invaluable
Digitization on a Dime 51
feedback throughout the project. The database designer and project
manager worked together to tweak the database in late 2005 and early
2006, leading up to training of volunteers and beginning of data entry
work.
Timeline
• April 2005 - Chelsea District Library creates a librarian position
with duties including digitization of the Family History Index
• October 2005 - Database development begins
• January 2006 - Database development and testing completed
• February 2006 - Volunteer training and data entry work begins
• May 2006 - Scanning of obituary cards begins
• October 2006 - Data entry of 15,000 records completed/ library
closes to move to new building
• January 2007 - Scanning resumes; proofreading and database
updates continue
• June 2007 - Final image attached to database
• October 2007 - Family History Index Online released to public;
timed to coincide with Family History Month in Michigan
Digitization process
The library had two old PCs available for use by volunteers in a
back office, a setup that proved beneficial to the project, as volunteers
found they could often focus better on the mundane task of data entry
while working in pairs. Because the database was entirely web-based,
no software installation was required. Volunteers also used each other
as resources to answer questions such as the interpretation of unclear
wording or how to enter data in a particular field. Each volunteer had
committed to working two hours per week for a period of at least three
months, in an attempt to minimize the amount of retraining that
would need to be done. Most volunteers stayed much longer, seeing
the project through to completion, and many also worked multiple
shifts each week. Through the volunteer program, these workers were
able to sign up for shifts during all hours the library was open,
including evenings and weekends, maximizing the number of people
who could be involved.
52 Digitization on a Dime
As the project progressed, it became clear both that some
volunteers wanted to participate but lacked interest or ability in data
entry and also that volunteer tasks existed beyond what the library
had originally envisioned. This led to some refocusing of efforts before
a final workflow developed. In particular, a pair of volunteers took on
the task of taking cards from the filing cabinet and sorting out those
that need digitization. They stored the sorted cards in a box and
transported them to the office where volunteers were engaged in data
entry. Cards not in use remained in the files, which prominently
displayed signs explaining the project. Volunteers doing data entry
took the sorted cards and entered them into the database. Cards that
brought up questions went into a separate pile for review by the
librarian. The rest went into a "completed" file and moved on to step
2, proofreading. Two volunteers with especially good eyes for detail, as
well as genealogical research experience, were recruited as
proofreaders, tasked with checking every fifth card. While it would
have been ideal to have a second set of eyes on every single database
record, this was not realistic, and conferral with professional
archivists confirmed that a 20 percent rate was more than sufficient.
After proofreading, cards moved on to scanning. Scanning
represented another challenge, in part because the library had only
one scanner which was also used for other purposes. In addition,
while the scanning software that accompanied the purchase HP
Scanjet 5500c was relatively simple, not all volunteers felt they had
the requisite level of computer skills. As a result, scanning did not
begin until about three months after data entry and was handled by a
subset of about 10 volunteers. Volunteers scanned cards at 300 dpi
and saved them in the archival standard TIF format. Because they
were on black-and-white newsprint and newsprint already has a very
low resolution, a higher resolution would not provide any benefits.
These archival copies of the cards have been retained in separate,
backed up files so that the original cards should never require
rescanning. The entire database, including these archival images, was
set to copy to a tape drive, with the tape changed daily. After one
week, tapes were reused for new backup copies, leaving the library
with multiple recent backups for added security.
Digitization on a Dime 53
At this point, the cards were refiled by a volunteer, returning
access to the public who still relied on the paper file for research. The
digital images then underwent additional processing. The TIF images
were converted, using Adobe Photoshop Elements into compressed
JPEG images with small file sizes. In general, the image of each card
posted to the database was no more than 100 Kb. It would download
quickly even with dial-up internet connections. Volunteers manually
attached these images to each database record, completing the cycle.
Data entry work progressed remarkably quickly, with volunteers
putting in close to 2,000 hours between February and October 2006.
In October 2006, the project went on hiatus while the library packed
up its temporary quarters and moved into a new facility. Volunteers
completed data entry for the final card just before the move
commenced, adding more than 15,000 records into the database in
eight months. Between half and two-thirds of the scanning had also
been completed at this point. The move to the new facility caused
significant delays in the project due to other priorities for the library's
technology staff. Scanning resumed in February 2007, however, and
the final image was attached to the database in June. The library used
the next few months to continue proofreading and to test the
robustness of the database, releasing it to the public in conjunction
with Michigan's Family History Month in October 2007.
Results
Chelsea District Library's Family History Index digitization
project was a major success. Through a combination of creativity,
open source software, and volunteer contributions, the library
produced a highly usable online database providing full access to more
than 15,000 obituaries for less than $5,000, completing work in about
15 months. It demonstrates that even at an institution with little staff
and little specific expertise, a project can be developed to meet both
the community's needs and the standards set by the library and
archives world. The community involvement and grassroots nature of
this project made it truly special for those who participated. It drew
attention to the library, increased the volunteer corps, and gave
54 Digitization on a Dime
community members a sense of ownership for a key piece of the
library's collection.
The flexibility with which the database was designed has proven
vital to its success. Once done with the initial work of populating the
database, the library turned its attention to improvements. One goal
was to provide full-text access to all obituaries, even those from
newspapers old enough that the only existing copies were on
microfilm. In early 2008, the Friends of the Library applied for and
received grant funding to help purchase a digital microfilm machine,
which volunteers are currently using to scan obituaries from the
Chelsea Standard going as far back as copies survive, into the late
1800s. These digital images are then either added to existing database
records or used to supplement the database.
There also remained the problem of providing access to obituaries
of local residents that were printed in neighboring newspapers. To this
end, the library looked to its partnerships with local businesses.
Chelsea is represented by two local funeral homes, and directors of
both proved willing to provide the library with access to their files.
When the funeral homes submit obituaries to any area newspaper,
they also send an electronic copy, including photo where available, to
the library. These partnerships have allowed the library to enrich the
database with full-text access and color photos for more recent
obituaries. The text that is sent electronically is pasted into the notes
field, and images are higher quality than those ultimately printed in
the newspaper. Along with this material, if the final obituary is printed
in the Chelsea Standard, the scanned newspaper clipping is attached.
The funeral homes have also indicated that they have computerized
files going back a number of years, and the library is investigating the
possibility of further enhancing the database with this material.
As another extension, the library returned to its initial contractor
in 2008 to develop a database on the same platform to house the
library's local history collection, which had been brought out of
storage and organized only in 2007. Much of this material was even
more valuable and unique than the obituaries, leaving the library with
a strong desire to have it made accessible to the public primarily, if
Digitization on a Dime 55
not exclusively, in digital format. Scanning of material and database
development have continued in 2009. While working on this and
other local history initiatives, including a series of oral history
projects, the library has found additional material to enhance the
obituary database records of members of Chelsea's founding and
prominent families.
Aside from the primary lesson that ambitious digitization projects
are not beyond the reach of even very small libraries, the Family
History Index digitization project provided additional lessons that
apply to similar projects and beyond:
• The existence of a well-setup volunteer program allows a library to
think much bigger than would otherwise be practical. People are
out there in all communities who have the time, expertise, and
interest to contribute.
• Planning is good but flexibility is essential. Projects may stray
somewhat from their original vision or carefully thought out
procedures, but that isn't necessarily bad. Being open to new
opportunities and listening to workers and users can ultimately
make a project much richer.
• Open source doesn't have to be terrifying. In fact, open source
solutions are generally very stable, as the Family History Index
Online has proven to be. Aside from minor software upgrades, the
server has provided consistent access to the database for more
than two years with virtually no staff intervention required.
• There's nothing wrong with thinking big, but it doesn't hurt to be
realistic. Ambition leads to projects being even more successful
than initially imagined. That said, a realistic assessment of aspects
such as which tasks could be handled in-house (project
management) and which should be contracted out (database
design) prevented later stumbles.
• Partnerships enhance any library activity. In this case, a good
working relationship with the local newspaper eased the process of
gaining copyright access, and new partnerships with funeral
homes have enhanced both the database and the library's
reputation in the larger community. In other situations,
56 Digitization on a Dime
partnerships could be used to gain access to services or material
the library cannot pay for or obtain on its own.
Small libraries often represent their communities' best hope for
preserving local history. Preserving this history, even with little or no
budget, is within reach if libraries combine their expertise in
information management with technology decisions geared toward
simplicity and a lack of hesitation in taking advantage of the
knowledge and goodwill in their communities. Digitization projects
not only preserve the past but provide an opportunity for greater
community involvement, partnerships, and identification of the
library as a key to the community's overall health.
References
Chelsea District Library. (2009). Chelsea District Library Family
History Index Online. Retrieved on Jun 30, 2010 from
http://fh.chelsea.lib.mi.us/
Collaborative Digitization Program. (2008). Western States digital
imaging best practices, version 2.0. Retrieved on March 31, 2010
from http://www.bcr.org/dps/cdp/best/wsdibp_vi.pdf
Michigan State University Libraries. (2005). The making of modern
Michigan: Digitizing Michigan's hidden past. Retrieved March 15,
2010, from http://mmm.lib.msu.edu/.
Cornell University Library. (2010). Moving theory into practice
digital imaging tutorial. Retrieved March 15, 2010, from
http : / / www. library. Cornell . edu/ preservation/ tutorial
National Information Standards Organization. (2007). A framework
of guidance for building good digital collections. Retrieved March
15, 2010, http://framework.niso.org/
Building the ALBA Digital Library
Jill Annitto (Archivist)
Abstract
This chapter serves as a case study of how a professional digital library
can be successfully built with a small staff and budget. It discusses the
planning and experiments with beta versions of the Abraham Lincoln
Brigade Archives (ALBA) Digital Library, the final version of which is
available on ALBA's website, through Metropolitan New York Library
Council's (METRO) Digital Metro New York program, and OCLC
WorldCat. The sensitive issues of digitizing another institution's
collection while maintaining ownership of the final product are also
explored.
Keywords: Copyright, CONTENTdm, Database, Digital archive,
Electronic classroom, Educational resources, Funding, Indexing,
Planning, Ownership issues, Small budgets
Introduction
In 1979, recognizing the vital importance of their radical history, and
the need to collect writings, letters, photographs, oral histories and
artifacts that would preserve their story, the Veterans of the Abraham
Lincoln Brigade, the American volunteers who fought with Republican
forces against Generalissimo Francisco Franco during the Spanish
Civil War (1936-39), formed the Abraham Lincoln Brigade Archives
(ALBA). Today ALBA lends its name to a major archive at New York
University's (NYU) Tamiment Library and independently supports
cultural and educational activities related to the war. The ALBA
57
58 Building the ALBA Digital Library
collections are the most requested at the Tamiment Library. ALBA's
relationship with NYU is non-traditional, which makes the ALBA
Digital Library unique. NYU owns and maintains the Archives and
hosts many of ALBA's programs, yet the ALBA name gives a different
impression.
This chapter explores the challenges of digitizing a collection that
is owned by another institution and how to overcome working with a
limited budget, dated technology, and minimal staff to produce a
professional digital resource. The ALBA Digital Library (Abraham
Lincoln Brigade Archives, 2008) evolved from an ineffective form-
based website tool to a fully indexed resource using CONTENTdm in a
matter of months.
ALBA Goes Digital
Until a full-time Executive Director was hired in 2007, ALBA was
initially run by a group of volunteers followed by a string of part-time
administrators. The ALBA Board of Governors hired me as the
Assistant Director in May 2008 just as they began reconsidering their
outdated website, a major step for the small organization.
By June 2008 the Executive Director had resigned, leaving me as
ALBA's only employee. It provided me with a great opportunity to
help redesign the website alongside a subcommittee of the Board of
Governors. The Web Committee, consisting of four history professors
living across the United States, set a timeline of four months for the
site's overhaul, with an anticipated launch date of early October. The
Board wanted to stake a claim as the premier electronic resource for
information on the Spanish Civil War.
The Impetus for a Digital Library
In July 2008 ALBA launched its first annual ALBA Teachers Summer
Institute at NYU. The Institute hosts teachers from New York City
public schools and exposes them to the history, art, and politics of the
Spanish Civil War. Part of this immersion includes a trip to NYU's
Tamiment Library to view the Brigade's Archives where teachers
Building the ALBA Digital Library 59
receive an overview by the collection's archivists as well as professors
from various departments at NYU.
After this initial success, ALBA decided to sponsor two more
Summer Institutes (as well as year-round professional development
seminars) in Tampa and San Francisco in 2009. The new settings
precluded a site visit to the Archives so we needed to find a way to
bring the Archives to the teachers.
The solution to this dilemma was the creation of a digital library
or archive to be launched in conjunction with the new website. The
website redesign was going to make ample use of ALBAs collection at
NYU's Tamiment Library anyway, slowly integrating digital
collections of letters and photos, and eventually including video clips
and oral histories. These primary resources would strengthen existing
lesson plans and other educational modules once the site was
completed and allow for a more dynamic classroom experience. Since
the site was already being designed, we thought it would be a great
opportunity to have a collections database created for ALBA by our
website designers.
Defining the Digital Library
In my experience archives or library staff working in conjunction with
a representative from the information technology department does
most digital library planning. At ALBA, I was the only staff member
and every decision required consensus of the Web Committee, busy
professionals with jobs and other responsibilities. Every time an issue
arose it could take nearly a week to be resolved. This is a problem that
small museums with very active boards will be familiar with,
particularly when board members have little time, experience or
interest in the digital project at hand.
We quickly ran into a problem of defining the digital library; the
Web Committee believed everything created by ALBA was archival. I
was in favor of a more traditional, primary source, collections-based
digital library with a thesaurus and cataloging guidelines. My idea for
ALBA's digital library was to create a system that would recall only
primary sources from the war itself. The digital library would be its
60 Building the ALBA Digital Library
own entity containing items that are separate from all other files
posted to the website.
After several weeks of negotiations via email and conference calls,
the Web Committee decided the best option would be to include all
online documents in the digital library. The Web Committee charged
the website designers with creating a web-based, simple form-based
recall system as part of their contract.
The Beta Versions
Albita
Named Albita (or 'little ALBA') this resource was first launched in
November 2008 along with the debut of the new website. It included
the organization's quarterly newsletter, book reviews, and transcribed
letters, not primary archival sources. Uploaded items were listed in
random order rather than alphabetically or by date. In the end the
design did not conform to the standards of information professionals;
Albita had become a "junk drawer" of every single file uploaded to the
website.
Document Library
Even renaming it and reconfiguring the display, the Document
Library was not robust enough. To recall specific items, the user had
to consciously use certain keyword strings, e.g. "George Watt Prize
Winner," when retrieving items. It did not include a thesaurus and the
plain-text description field did not allow for paragraph breaks. These
descriptions were displayed as a solid block of text.
Media Library
Simultaneously added to the site was the Media Library. This
database appears three pages deep within our Resources tab. It is the
only way to access images (from archival photographs to logos of
partner organizations) without using the ALBA Digital Library or
slogging through the results from a general site search. Unfortunately
because it is not linked to any other page, it is not often accessed. The
Building the ALBA Digital Library 61
plain-text description field is displayed as a block of text and does not
allow indexing.
Once the Web Committee saw Albita, the Document Library, and
the Media Library in action the problems became apparent. We
considered going back to the drawing board with our website
designers, to create a thesaurus or a more sophisticated database
system. Unfortunately, our original contract did not cover building a
true digital library; the cost would have been prohibitively expensive.
I explained the possibilities of a professional digital library:
adding our records to a consortium's collection, making them widely
available through WorldCat, including them in an NYU catalog at the
item level. It was difficult to convince the Web Committee that we still
had affordable, professional options that would blend seamlessly into
the new website.
Building a Better Mousetrap
As the only person on staff who could design and implement the
digital library, I had to balance time spent on digitization with my
other duties, including grant writing, public programming, and
administrative issues.
The first problem for ALBA to overcome was the lack of any type
of digitization equipment. The nature of ALBA's work and budget did
not necessitate having a full flatbed scanner or laptop on hand.
Another major concern for ALBA was the high cost of software. A
rough estimate for a full software package ran to the thousands of
dollars. We didn't intend to scan the entire collection, just highlights;
for our purposes what we needed was something that would allow us
to scan part of the collection but offer maximum exposure.
In January 2009 I became aware of the Metropolitan New York
Library Council's (METRO) Digital New York program. NYU's status
as an institutional member of METRO allowed ALBA to benefit from
the digitization program, a partnership between METRO and OCLC
and CONTENTdm. As a separate organization, ALBA's much smaller
annual budget resulted in a nominal $200 annual fee, paid to METRO
62 Building the ALBA Digital Library
for access to the CONTENTdm desktop module and the upload of 500
discrete items. With software issues out of the way, we could focus on
permissions and access to the collection.
Since ALBA's intention was to work with a collection that is
owned by another large institution, we had to be very thorough and
diplomatic about the project. We presented a sample record that
would name NYU and Tamiment Library as the copyright holders, as
well as instructions on obtaining permission to use the images. We
assured NYU that the digital library pages on ALBAs website would
include the same information. A distinction is made on both the
website and in each record that the ALBA Digital Library is published
by ALBA while the copyright is held by NYU. After several weeks of
negotiation, we were free to move ahead with the project.
With software and permissions settled we were left with the issue
of hardware. ALBA Board Member and NYU Professor of Spanish and
Portuguese James Fernandez offered his laptop and Epson flatbed
scanner for the duration of the project. After several weeks of further
negotiation in order to gain access to the collections, Michael Nash,
ALBA Board Member and Director of Tamiment Library, allowed us
to scan the items ourselves, free of charge.
Selection, Policies, and Standards
In general there is a lack of digitized archival resources available on
the Spanish Civil War. The closest to any kind of digital information
on the subject is through Spartacus Educational (Simkin, 1997), an
online British encyclopedia dedicated to educating students on
history, with a significant amount of information on progressive
history. But even this site is lacking in primary source materials.
Besides the Abraham Lincoln Brigade Archives, there are other
Spanish Civil War archives in the United States: the Southworth
Collection at the University of California at San Diego (see University
of California, San Diego, n.d.) and the Spanish Civil War Collection at
the University of Illinois - Urbana Champaign (see University of
Illinois at Urbana-Champaign, n.d.) Still, the Abraham Lincoln
Building the ALBA Digital Library 63
Brigade Archives at NYU is the largest collection of American
volunteers' archives in the US.
Our goal was to put forth a curated collection of the most
compelling and historically significant items in the Archives; the
original plan was to digitize 500 items, but the realities of time and
staff restraints reduced this number to an initial 150 items. These
items included postcards, letters, newspaper articles, a multi-lingual
newsletter, and a telegram from Ernest Hemingway.
Dr. Fernandez performed the selection of four collections based
on the following criteria: size (only one manuscript box each),
condition (stable enough for handling and scanning), and variety
(each collection included letters from a variety of people). These same
collections had been used in the Teachers Summer Institute and they
proved to be popular with the teachers.
When it came to setting the scanning standards I turned to the
University of Wisconsin-Milwaukee Libraries' Digital Collections (see
University of Wisconsin-Milwaukee, 2006) pages, namely the
Transportation Around the World, 1911-1993 collection that I worked
on as a graduate student there in 2003. The collection was built using
CONTENTdm, then in its infancy. I followed UWM's digitization
standards and policies as well as their formula for long-term archival
storage as a basis for the ALBA Digital Library.
Digitization Standards
All items were scanned in full-color at a resolution of 600 dpi
using an Epson flatbed scanner. The items were saved as the highest
quality TIFF files and stored on an external hard drive. We refer to
these TIFF files as the Archival Images as they are used only to create
access images (PDFs or JPEGs) and are otherwise not accessed.
Access Images were created for web delivery, in the form of
thumbnails on the ALBA site and for display within the CONTENTdm
records, as well as for everyday use. The letters in the collection do not
have OCR capabilities nor are the PDFs searchable. These were saved
on both DVDs and the external hard drive. Because the METRO
contract only covered 500 discrete items, we used Photoshop to stitch
64 Building the ALBA Digital Library
the JPEGs together to create a single PDF file for multiple-page items.
Not only did this save room in CONTENTdm, it was also useful to
have multiple-page documents combined for reproduction purposes
and to better keep track of collections.
All of this information is posted on the Digitization page of the
ALBA Digital Library section. Sharing this information shows
researchers (and potential donors) that the digital library adheres to
archival standards while also helping other librarians and archivists
plan their projects.
To access the images it is important to label them properly. Some
scanners and scanning programs assign numbers that may or may not
be useful. Auto-numbering systems can cause problems if the items
are not easily identified by sight (letters, manuscripts). I named the
files according to the manuscript collection followed by a number that
corresponded to the letter's order in the folder, followed by a decimal
that corresponded to the page number. For example, Lardner.1.4
would be the fourth page of the first letter of the Lardner Collection.
The stitched PDF file would read Lardner. 1.
Indexing
The documents were indexed using Dublin Core metadata
(Dublin Core Metadata Initiative, 2010) which are standard in
CONTENTdm, including: Creator, Date, Identifier, Type, Source,
Description, Format, Coverage, References, Relation, Language,
Publisher, Rights, and ALBA Reference Number.
Included in CONTENTdm is a default thesaurus (Thesaurus for
Graphic Materials) for the Identifier field. Similarly, a default
thesaurus provided the terms for geographic location in the Coverage
field. Those fields for which the thesaurus was unique to the Abraham
Lincoln Brigade Archives, I built a new thesaurus. These fields were:
Type (e.g. text, image), Format (e.g. paper + size in centimeters),
Identifier (e.g. envelope, autograph letter signed, typed letter),
Publisher, Rights, and Creator. I added new Creators as they came up
in each collection; some collections had as many creators as there
were letters. In addition, for collections that had an online finding aid
at NYU I included a link to that page.
Building the ALBA Digital Library 65
In all, the ALBA Digital Library took four months for 150 items to
be scanned, stitched, catalogued, and uploaded. I generally spent
about fifteen minutes cataloguing each letter, sometimes longer
depending on the length of the document. I worked on the digital
library an average of eight hours per week with some weeks going by
without any work at all. As collections were completed I added
information to the ALBA Digital Library page and continued to
redesign the website as necessary.
Final Product and Reception
The final product can be found at http://www.alba-valb.org
/resources/digital-library. The Teachers Institute Alumni find the
resource easy to use and helpful in the classroom, and ALBA even
used the scanned collections to create facsimile copies of the Archives
for the Tampa and San Francisco seminars.
As of publication, the digital library is available as a discrete
collection on Digital Metro New York (http://cdmi2840i.cdmhost.
com/cdm4/search.php), the digital program of the Metropolitan New
York Library Council, the New York Heritage Digital Collections (New
York Heritage, 2008) website, as well as at the item level on OCLC
WorldCat (http://www.worldcat.org).
Seeing a Digital Project from Beginning to End
Planning is Key
Work backwards and set aside a day or two to really think about
what you want to see in the final product. The small team (just myself
most of the time) helped keep bureaucracy to a minimum. Remember:
it is cheaper to do it right the first time.
Unofficial market research
I asked librarian and archivist friends what they liked to see in
digital libraries and to give me advice on moving forward. I also
reflected on what struck me in online collections, both the positive
and the negative.
66 Building the ALBA Digital Library
Tread Lightly
Occasionally it was necessary to receive approvals and it was
important to remember that when dealing with large institutions there
will be politics. This exists everywhere and I learned not to take it
personally. Many of these issues were deep-seated and existed long
before I joined the organization. I also learned that having a board
member installed in a specific department or company does not
guarantee easy access or donated materials.
Within my own organization, some board members did not
understand the potential of planned, professional digital libraries, or
that one person could undertake such a project. Price was also an
issue; it was only after Albita and the Document Library failed was
CONTENTdm considered and accepted as an amazing deal. In the
end, waiting for the Web Committee members to come around in their
own time was the best plan of action for this project.
Push the PR
As I completed each collection I sent information about the digital
library everywhere: from Facebook and Archivists' Roundtable of
Metropolitan New York to ALBA's listserv, e-news, quarterly
newsletter, and fundraising appeals. While this publicity was mainly
sent to people within the ALBA network, it also garnered the attention
of local archivists and library students interested in doing small digital
projects on limited resources.
References
Abraham Lincoln Brigade Archives. (2008). ALBA Digital Library.
Retrieved March 30, 2010, from http://www.alba-valb.org/
resources / digital-library
Dublin Core metadata initiative. (2010). Retrieved March 30, 2010,
from http:/ / dublincore.org
New York Heritage. (2008). Retrieved March 30, 2010, from
http://www.newyorkheritage.org
Building the ALBA Digital Library 67
Simkin, J. (1997). Spartacus educational. Retrieved March 30, 2010,
from http://www.spartacus.schoolnet.co.uk/Spanish-Civil-
War.htm
University of California, San Diego, (n.d.). Southworth Spanish Civil
War Collection. Retrieved March 30, 2010, from UC San Diego
Libraries website, http://libraries.ucsd.edu/locations/mscl/
collections/southworth-spanish-civil-war-collection.html
University of Illinois at Urbana-Champaign. (n.d.). Spanish Civil War
Collection. Retrieved March 30, 2010, from University of Illinois at
Urbana-Champaign Rare Book and Manuscript Library website,
http:/ /www.library.illinois.edu/ rbx/ SCWPeople.htm
University of Wisconsin-Milwaukee. (2006). Transportation around
the World, 1911-1993. Retrieved March 30, 2010, from Digital
Collections -Transportation around the World, 1911-1993 website,
http : / / www4 . uwm.edu /libraries/ digilib / transport / index, cf m
Digitization and Access of Louisiana Oral
Histories: One Oral History Center's
Experience in the Digital Realm
Gina R. Costello (Louisiana State University Libraries)
Abstract
The Louisiana State University (LSU) Libraries Center for Oral
History began an effort to digitize at risk and high demand collections
in 2007. The Center acquired digitization equipment, server space,
and collaborated with the Libraries Special Collections Digital Services
librarian to offer digitized oral histories online via the statewide
Louisiana Digital Library (LDL). This paper details the history of the
ongoing development of a digitization program for oral history
materials using two staff members and limited resources. Decisions
about what materials to digitize and how, equipment and software,
and issues with access and preservation will be discussed.
Keywords: Audio digitization standards, CONTENTdm, Digitizing
audio, Digitization equipment, Digital library, Digitization workflow,
Oral history, Oral history interviews.
Introduction
The Louisiana State University (LSU) Libraries T. Harry Williams
Center for Oral History began to digitize at risk and high demand
collections in late 2007. Planning for the systematic digitization of the
primarily analog collection began a year prior to any digitization
efforts. The Center sought advice from an expert in the field, acquired
68
Digitization and Access of Louisiana Oral Histories 69
digitization equipment and server space, hired a full time employee to
manage digitization, and collaborated with the Libraries Special
Collections Digital Services Librarian to offer digitized oral histories
online via the statewide Louisiana Digital Library (LDL).
The Center staff and the Digital Services Librarian have
prioritized collections for digitization based on fragility or patron
demand, made decisions about organization and access of the audio
materials for the public, and addressed copyright issues. Only a small
number of oral history collections have been added to the LDL,
although over 700 hours of tape have been digitized so far.
This paper details the history of the ongoing development of a
digitization program for oral history materials with one full time staff
person and partial effort from another staff member. Decisions about
what materials to digitize and how, equipment and software, and
issues with access and preservation will be discussed. Results of the
digitization and online access efforts have been mixed, but may serve
as an example for oral history programs wishing to develop a more
programmatic approach to digitization.
Center History and Description
The T. Harry Williams Center for Oral History at LSU Libraries
Special Collections documents the social, political, and cultural history
of LSU and the state of Louisiana by conducting, collecting,
preserving, and making available to the public oral history interviews
of folk artists, war veterans, governors, congressmen, state and local
officials, civil rights activists, and other historically prominent figures
in Louisiana. The Center maintains over 4,000 hours of tape-recorded
interviews. The three person staff and a number of student workers
transcribe, index, and deposit oral history interviews for archival
storage at LSU Libraries Special Collections.
The Center, opened in 1991, is named after a man who helped
legitimize the field of oral history. Dr. T. Harry Williams, a popular
and acclaimed southern history professor at LSU spent more than ten
years researching the biography, Huey Long. Published in 1969, this
Pulitzer Prize and National Book Award winning book drew upon
70 Digitization and Access of Louisiana Oral Histories
Williams' tape-recorded interviews with nearly 300 individuals.
Williams used a 30 pound Webster Electric Ekotape reel-to-reel tape
recorder to capture the interviews.
The primary mission of the Center is to document the history of
LSU. Since the history of the state and university are closely
intertwined, many broader Louisiana subjects are documented as
well. Public outreach through training workshops, consultations, and
collaborations with individual researchers, community groups,
classes, and institutions, enhance oral history collections throughout
the state. Often, the collections are donated to LSU Libraries for
preservation and public access. In many cases copies are provided to
libraries, schools, museums, providing access for members of the
communities in which the oral histories were collected.
The Center differs from some oral history centers in its
commitment to providing fully edited transcriptions of all recorded
interviews. Barring any restrictions placed on interviews by the
interviewee or interviewer, the audio and a full transcription are made
available to scholars and the general public. Because of the large
volume of interviews that are collected each year, the Center
maintains a backlog of interviews that are not fully processed (i.e.,
digitized if applicable, transcribed, audited, and cataloged). Interviews
are organized into more than 40 different series, including Civil
Rights, Military History, and Political History.
The Center Director has taken a more programmatic rather than
project-based approach to the digitization of the collected oral
histories. To ensure that preservation issues are addressed and
collection access is a top priority, the Director employs a full time
sound technician/webmaster at the Center. Center staff also works
with the Special Collections Digital Services Librarian to mount oral
history collections to the Louisiana Digital Library (LDL)
(http : / / www.louisianadigitallibrary. org) .
The Center makes available materials that are not restricted by
the interviewee or interviewer. Interviews are digitized on demand for
patrons, for preservation purposes, and for public access on the LDL.
Prior to the acquisition of digitization equipment, patron requested
Digitization and Access of Louisiana Oral Histories 71
copies were recorded from cassette tape to cassette tape. Now
materials are delivered to patrons via CD unless a cassette tape is
requested. Copies are provided for a fee to patrons, although a small
number of oral histories maintained by the Center are available for
listening free online in the LDL. Center staff generally digitize fewer
than five interviews per month for patron requests.
The funding for the Center is a mix of Libraries monies and
endowment funding. The Libraries pays the salaries of the Director
and two full time employees. Student workers' pay, a portion of travel
money, and some supplies are also paid for by the university. The
Libraries purchases and provides support for computers for the
Center staff and student workers. Endowment funds cover most travel
expenses, the majority of the equipment (specifically the field
recorders, digitization station, software, fax machine, scanner), any
Graduate Assistantships, additional student workers, and the majority
of the transient workers' (e.g., professional interviewers, transcribers,
editors) wages
Early Forays in Digital Access
One of the earliest digital projects the Center was involved with was a
pilot project to digitize oral histories that are part of the University
History Series sub-series, Integration and the African American
Experience at LSU. The sub-series contains interviews with black
students, faculty, and administrators at LSU during integration (1950-
1970), plus interviews with lawyers and their clients who were
involved in key lawsuits, as well as politicians and others who were
vocal opponents or supporters of integration. The resulting digital
collection, named "Integration and the Black Experience at LSU"
(2003) contains audio files and transcriptions of three individuals
interviewed between 1985 and 1998.
This legacy digital collection is scheduled to be revamped soon.
The ".rm" or ".ram" audio files are available for listening only in
RealPlayer and must be downloaded to the listener's computer before
playing. The digital files were created more than eight years ago, so
the sound quality could be improved and the information about
72 Digitization and Access of Louisiana Oral Histories
equipment and digitization method has been lost. The analog tapes
will be re-digitized and optimized using current technologies.
Between 2001 and 2005, the Center utilized the skills of their part
time webmaster and other staff members to create several online
exhibitions and presentations (T. Harry Williams Center for Oral
History Exhibits and Presentations, 2009) using readily available
software and tools: simple HTML, PowerPoint, and Windows Movie
Maker. Notable among these is the digital exhibition, "Baton Rouge
Bus Boycott of 1953. A Recaptured Past" (2009) which includes a
background and chronology of the event complete with photographs
and audio excerpts. "Leaving Vietnam" is a nine minute presentation
of audio clips from the Americans in Vietnam collection, featuring
stories of escape from three Vietnamese refugees who immigrated to
Louisiana around 1975 while fleeing Communist takeover. The
presentation debuted at the 2005 Oral History Association annual
conference and is currently available on YouTube, where it has been
viewed over 6,000 times. Two other presentations were also mounted
on YouTube to provide ease of access.
Center staff also began digitizing oral history transcriptions that
were only available in paper format in 2004. They had some success
using a HP Scanjet 5590 document feed scanner and an early version
of Readiris optical character recognition (OCR) software. The software
was lost, and the Libraries Systems department replaced it with
Readiris Pro 11. Subsequent digitization efforts have been stymied by
problems getting good readable OCR text, so the project has been put
on hold. Student workers often are tasked with re-keying
transcriptions.
In 2007 the Center Director, with the help of the LSU Libraries
Special Collections Exhibitions Coordinator, curated a physical
exhibition called "Have you Heard?: The Past in First Person from the
T. Harry Williams Center for Oral History". The extensive exhibition
contained ephemera and narrative relating to more than a dozen oral
history collections. The Libraries provided two "listening stations",
computers loaded with web-based presentations in the exhibit hall. In
addition, exhibit-goers could check out MP3 players with pre-
Digitization and Access of Louisiana Oral Histories 73
recorded narration of the exhibition contents and snippets of oral
history interviews. These digital offerings were made available with
little cost using spare computers and a staff member as the voice of
the narrator. No previous Libraries exhibition had employed
technology in these ways. The Center Director counts the exhibition a
success, as it led to a few collection development opportunities and
awareness of the Center and its mission.
Digitization Station
After attending a digitization workshop at the Oral History
Association annual conference in 2006, the Center Director decided
that the systematic digitization of at risk and high demand analog
collections should become a central focus for the Center. With the idea
of "going digital" but with little research in hand they initially
purchased two standalone analog to digital Lucid AD9624converters,
which are designed to work in a recording studio setting. They realized
belatedly that the converter units themselves were not useful without
a digitization station, which would cost several thousand dollars. The
Center made the all too common mistake of purchasing equipment
without a clear plan how the individual hardware or software will
interface with existing equipment. Fortunately they were able to later
purchase a digitization system that uses one of the Lucid converters.
In order to ensure that in the future the Center made sound
investments in technology and established a digitization workflow
appropriate to their needs, the Director sought advice from oral
history expert Doug Boyd at the University of Kentucky. Dr. Boyd
visited LSU in March 2007 to evaluate the Center and conduct an
introductory digital audio workshop for the Libraries staff. He
generated a seven-page report with recommendations for equipment,
collection development, and staffing.
Recommended Analog to digital work station equipment
and software
1. Lucid AD9624 A/D Converter
2. RME Hammerfall DSP 9632 PCI Audio interface
3. 2 Yamaha HS50M 5" Active Monitor
74 Digitization and Access of Louisiana Oral Histories
4. 1 Tascam 202MKIII Dual Recorder Cassette Deck
5. 4 BP20 20' TRS - TRS Cable
6. 8 DKQR10 10' Dual RCA - TS Cable
7. 1 Furman PL8II 15 Amp Power Conductor w/Light
8. 1 DT770pro Closed Studio Mon Headphone
9. 1 Presonus Cent. Station Audio Control Center
10. 1 Plextor PX-716UF External CD-R/DVD+-RW
11. Sony Sound Forge 8.0 Audio Editing Software
12. Sony Noise Reduction 2.0 Noise Reduction Plug-In
The equipment recommended in the report was purchased with
endowment funds nearly a year after Boyd's initial visit. Boyd
returned to the Center to help set up the equipment and train a newly
hired staff member.
Although not all institutions have the funds to hire a consultant,
this less than $2,000 expenditure has proved money well spent for the
Center. Without the vetting of the digitization program, the listed
recommendations for equipment, and Boyd's
encouragement to pursue positioning the Center as a leader in
digitization efforts in the state and the profession, the Libraries
administration might not have acted so quickly to support the
endeavor. The administration approved reallocating funds to hire a
full time staff member for the digitization and in less than two years,
the Center has been able to digitize over 700 hours of interviews with
their single digitization station.
With the addition of a dedicated digitization station and full time
staff member to manage the process, the Center was ready to begin
digitizing in earnest. It was immediately apparent, though, that server
space and file redundancy would be an issue. The average file size of
one hour of digitized uncompressed audio from analog tape is around
1.5 Gigabytes (GB). The Center only had access to a relatively small
74GB drive when digitization began.
Working with the Digital Services Librarian and the Libraries
Systems Administrator, the Center temporarily located all digital
audio files to a 5TB networked server that primarily serves as storage
for TIFF images. In late 2009, a regional corporation donated used
Digitization and Access of Louisiana Oral Histories 75
storage equipment to the Libraries. The Libraries' Systems
Administrator was able to configure four 2TB Raid 5 storage arrays,
totaling approximately 8TB, for the Center's long term storage. This
unexpected gift enabled the Center to continue digitization efforts,
although they will still have to be selective.
The Center exists not just to archive, but to conduct research-
based oral history interviews and to educate the community about
conducting interviews. To fulfill this mission, the Center keeps a stock
of digital audio field recorders to loan for oral history projects. As
noted earlier, this equipment is purchased with endowment funds.
The Center currently has four Edirol R-09 recorders, two Marantz
CDR 310 recorders, and five Zoom H2 Handy recorders for loan.
Center staff uses a Marantz PMD 661 for interviews.
The Edirol R-09 and Marantz CDR 310 are portable CD recorders
and the Zoom H2 Handy records employ flash memory. Individuals
borrowing the equipment are trained and instructed on its use.
Digitally recorded interviews are brought to the Center either on CD
or on secure digital (SD) flash memory cards. Interviews are saved to
the Center's server and eventually processed.
Digitization Workflow
The digitization process is handled by one staff member, although he
has recently trained a student worker to help run the digitization
station. The staff member samples the audio to determine the optimal
hardware and software settings and reformats the analog tape to a
lossless uncompressed digital master WAV file. This master file is
captured at a bit depth of 24 and a sample rate of 96 kHz in stereo.
The master WAV file is stored on a networked server, which is
routinely backed up to a tape drive. This "master file" is not altered
after the initial digitization process. Whenever possible, barring any
time or funding constraints, a copy of every collection is also stored on
an external hard drive as well as burned onto a gold archival CD.
The staff member then creates an optimized file from the master
WAV file. Using Sound Forge software, he improves the signal
strength and removes distortion from the audio. The optimized file is
76 Digitization and Access of Louisiana Oral Histories
saved as a WAV file to a different location on the server. He then
generates a compressed MP3 file from the optimized file. This MP3
file is the use copy, and it is also saved to the server.
Unprocessed collections are digitized prior to processing to
facilitate time stamping of the transcriptions. The Center uses Express
Scribe Transcription Playback Software (http://www.nch.com.au/
scribe/) and adds time stamps to the transcriptions based on the
actual run time. Old transcriptions will be re-audited and time stamps
added because the tape time stamps are arbitrary, often reset every
time the tape player is used.
Metadata for the entire collection is kept in a Microsoft Access
Database. All oral histories entering the center are processed based on
a 13 page processing checklist. The processing checklist steps include
1) Accession 2) Transcribe 3) Audit 4) Send to Interviewee 5) Edit.
This process is time-tested and thorough. The majority of the oral
history collection is cataloged according to AACR2 standards in
MARC format in the LSU Libraries online catalog (i.e., OPAC). The
Dublin Core metadata in the digital collections is often copied directly
from these catalog records.
Implementation and Access
The Center does not currently have a formal collection development
policy to determine which oral histories are digitized The interviews
that have been digitized thus far were identified as "high risk" on
unstable medium or they were considered to be of particular interest
to researchers and the public. Materials are also digitized "on
demand" for patrons for a fee.
Tapes that were created prior to the Center opening in 1991 and
later donated were assessed for deterioration and digitized as a means
of preservation. For example, the 60 interviews in the Americans in
Vietnam series, recorded between 1974-1977, were identified as at risk
and were prioritized for digitization. Because of the content of the
interviews, however, the digitized audio will not be offered via the
LDL. In this situation, preservation of the materials outweighed the
need to provide access.
Digitization and Access of Louisiana Oral Histories 77
Particular interviews and/or series of interviews, such as the
Hurricane Betsy Series or the McKinley High School Series, were
digitized because of their potential value to researchers and the
general public. These collections will be uploaded to the LDL as soon
as they are fully processed. Patron requested interviews that were
digitized on demand for a fee are also candidates for the LDL.
During the past two years the Center staff and Digital Services
Librarian have discussed workflows for uploading audio to the LDL.
They consulted collections mounted by the University of Louisville
(http://digital.library.louisville.edu/), Ball State University
(http://libx.bsu.edu/), University of Nevada, Las Vegas
(http://digital.library.unlv.edu/), and the University of California, San
Diego(http://ceo.ucsd.edu/index.html) to facilitate decision making
about the organization and display of online oral history materials.
The LSU Libraries serves all digital library materials via the
Louisiana Digital Library, which was developed at the start of this
decade by LSU Libraries and the LOUIS Library Consortium. LOUIS
staff maintains the LDL for the nineteen participating institutions,
including historical societies, libraries and museums. Individual
institutions add content to the LDL and all materials are available for
public use. The digital library is powered by CONTENTdm software
and hosted by OCLC. LOUIS staff assists LDL institutions with
customization of the software. LSU Libraries Special Collections
maintains over 35 collections in the LDL.
Adding audio collections to the LDL has been a slow process that
seems to move in fits and starts. Center staff and the Digital Services
Librarian have held many meetings and exchanged numerous emails
about serving digitized oral histories online. Debate about the topic
centered around how the interviews would be organized and
displayed. Many interviews, especially the life narratives, are topically
related even though they are in different series. For example,
university history overlaps with civil rights history in several
interviews. Organizing the interviews both topically and by series can
be achieved by using CONTENTdm custom queries to unite items
from different digital collections, although this method does require
78 Digitization and Access of Louisiana Oral Histories
staff to re-create the collection custom queries and topics or series are
added.
The CONTENTdm software seems more suited for its original
purpose to serve digital images, and the default treatment of audio
files is rather clunky. Audio does not play automatically, but instead
the text "Access this item" appears at the top of the screen and
metadata for the item below it. This presentation of the audio is
somewhat confusing, because it is not even immediately clear that it is
an audio file. Some institutions using CONTENTdm have devised
workarounds that make serving audio in the software more usable.
In order to better group interviews together with the
transcriptions and other related content, the Digital Services Librarian
began uploading files as "compound objects" or multi-part files in
CONTENTdm. Figure 2 illustrates this with the different files,
abstract, transcription, and audio, hyperlinked in the left column. This
display is not ideal since the metadata for the interview is on a
separate screen and the "Access this item" text is still present. In
addition to the cumbersome nature of the audio display, patrons
wishing to listen to it are forced to download the often very large file
to their computer. The Director felt strongly that other options not
requiring the patron to download the audio be explored. Copyright
would be difficult to manage if the audio was copied to different
computers.
After reading about Ball State University development of a user-
friendly embedded Windows Media Player above the PDF file within
CONTENTdm (Hurford & Read, 2008), the Digital Services Librarian
contacted LOUIS about implementing this method. LOUIS staff
worked with the LSU Information Technology Services (ITS)
department to obtain access to a streaming server from which the
audio could be served. MP3 files are uploaded to the server via FTP
software and the file path is linked to the item in CONTENTdm in the
metadata field "Stream File".
The embedded player facilitates ease of use by providing the
searchable PDF transcription to the patron as they listen to the audio.
It does not require listeners to download the audio, thus it better
Digitization and Access of Louisiana Oral Histories 79
protects the copyright of the files. Information about copyright is
included in the metadata for each item and future transcriptions may
be watermarked with a copyright statement.
To organize the oral history collections in the LDL, the Digital
Services Librarian used the "collection of collections" model that
CONTENTdm employs to organize user collections on their website
(http:/ /www.oclc.org/ contentdm/ collections/default.htm). The
individual series or collections are cataloged as a whole in the overall
Center LDL collection. The series are represented by an image and
selecting that image displays metadata taking the patron to the
interviews. CONTENTdm software allows the creation of custom
queries that will link the different collections and enable patrons to
search across them. The individual series can be added to and the
interviews and other materials in the collections will remain together,
searchable alphabetically by title.
Problems and Some Solutions
Every digitization endeavor has its problems, but it is the individual
institution's staffing, resources, and prior experiences that dictate the
solutions. The Center, although small, is supported by a large
university library. Digitization is a luxury that can be afforded because
the Center has endowment money to purchase equipment and to
provide staff with continuing education in the field. The time it takes
to digitize resources is not a major factor in the continuation of
digitization either because digitization is accepted as a part of the
overall processing workflow. Digitization at the Center will be funded
indefinitely and a full time employee will be dedicated to the effort if
at all possible.
The Center is now two years into their programmatic digitization
effort. At this point the digitization workflow has been well established
and interviews from a few collections have been uploaded to the LDL.
This section of the paper details problems encountered, such as legacy
digitized collections, prioritizing digitization efforts, storage solutions,
staffing, and digital access and display via CONTENTdm software,
and how the Center staff and the Digital Services Librarian resolved or
80 Digitization and Access of Louisiana Oral Histories
did not solve them. Many problems could have been mitigated with
more long-term planning, but the degree to which digitization efforts
are currently supported and the ramifications of beginning a
digitization program were not known at the start of these efforts.
The Center holds some legacy digital collections that do not meet
the current standards for digitization. Prior to acquiring the
digitization station and hiring an audio technician, Center staff did
some preliminary digitization of analog tape using an external cassette
tape deck connected to a computer. The sound was collected using a
low end sound card to ram (Real Media Player) format in a process
like the one that Washington State University Libraries used for their
African-American Oral History collections (Bond, 2004). These early
recordings were deemed important enough to place in the queue to be
re-digitized according to the Center's current standards. For practical
purposes, an institution may choose to keep legacy digitized items
even if they do not meet current standards because the cost to re-
digitize is high. For the Center the lessons learned with early
experiments in digitization were important in shaping the future
decisions to allocate more funds and staff to the digitization efforts in
order to produce better quality sound.
The Center's at risk materials were digitized first, however, some
of these materials are not good candidates for online access. The files
will need to be stored long term, but because of restrictions they will
be largely inaccessible. This falls within the mission of the Center,
which includes collecting in addition to providing access to oral
histories. Some audio files do not have completed transcriptions,
rendered them unacceptable for immediate uploading to the LDL. The
interview editing process is very time consuming and there is little
immediate results (Bond and Walpole, 2006). Digitization priorities
may differ depending on the institutional mission. If the mission is to
provide access and preservation is secondary, then more popular or
relevant collections should be digitized first. Institutions not
supported by a parent institution, such as the Center is by LSU
Libraries Special Collections, may not have the luxury to digitize
collections just to archive them.
Digitization and Access of Louisiana Oral Histories 81
Another ongoing issue is long term storage solutions for the
digitized files. The Center hoped to have files saved in at least three
different places, a dedicated server in the main library, CD, and offsite
storage. Some files are saved to an external hard drive in addition to
the networked server, and born digital audio is saved to Gold CD.
Ideally a copy of each master WAV file would be stored in offsite
storage in a similar set up to the University of Kentucky (Weig, Terry
& Lybarger, 2007), but this has not been implemented. The Libraries'
server on which all audio files are saved is backed up incrementally to
magnetic tape every night. Full backups take 40-120 hours because of
the amount of data contained on the servers, so they are conducted
once monthly. It is a secure system, but there is always a chance for
failure. Future plans call for the Center to assess file storage and
redundancy options.
The document "Sound Directions: Best practices for digital audio
preservation" provides recommendations for long term preservation
storage (Casey & Gordon, 2007), however many recommendations
may not be feasible for small centers. The authors emphasize that file
redundancy which is neither labor-intensive nor costly in media (e.g.,
CD or flash memory), should always be implemented. The majority of
institutions will likely not have multiple terabyte servers and staff to
keep them running, but files can at the very least be backed up to a
more affordable storage medium such as portable hard drive or CD.
Any storage medium can fail, however, so careful attention to this
matter is imperative if an institution is interested in long term storage
of files.
An issue that may require further review and assessment is the
current standard of capturing audio at the higher sample rate of 24 bit
96 kHz. As server space fills and the Center and Libraries' budgets
decrease, however, this standard may be reduced. Capturing audio at
16 bit 44.1 kHz reduces the file size by nearly half, and according to
some experts it does not substantially decrease the quality of the WAV
file (Weig, Terry & Lybarger, 2007). If the server is filled the Center
may elect to save the derivative optimized WAV file to CD rather than
the server. File optimization is time consuming, often taking the
82 Digitization and Access of Louisiana Oral Histories
length of the recording to complete, so deleting these files is not an
option.
Before embarking on a digitization project, an institution should
estimate the number of files that will be created and storage space
needed. An institution may choose to capture audio at a lower and still
acceptable rate to expedite the digitization process and conserve
storage space. The institution should conduct an assessment of
whether file optimization and multiple WAV files are needed before
creating additional files that must be saved over the long term. Any
derivative files can be recreated, so they should always be deleted or
copied to more affordable storage media if server space is at a
premium.
Another issue related to the audio capture standards is the
Center's lack of written standards and best practices. Workflow
principles and digitization methods are generally adhered to, but there
is no guide or manual, just institutional knowledge. The workflow is
based on recommendations by oral history expert Doug Boyd, who
served as an advisor to the Center and also wrote the tutorials and
information found on the Oral History Association website
(http://www.oralhistory.org/technology/). The Center should apply
the same level of detail and documentation to digitization workflow as
they have for the processing workflow.
There are only two staff members who work with the Center's
digital files, which could pose potential problems if either leave and
has often caused bottlenecks in the workflow. At the Center all
digitization is handled by one staff member with some student
support. Other Center staff members do not have time to perform
these duties, so little cross-training has been done. This is a risk
because if the staff person leaves it will be difficult to continue
digitization efforts. In the same vein, only the Digital Services
Librarian currently uploads items to the LDL. This duty is usually
shared by graduate assistants, but financial constraints have
prevented hiring any additional help. Digitized files often do not get
uploaded quickly because they are placed in a queue with all Special
Collections digital materials. Cross-training between the digital
Digitization and Access of Louisiana Oral Histories 83
technician and Librarian is an option that should perhaps be explored.
At the very least the two individuals, who are separated geographically
across campus should establish better communication and more
effective workflows. Information about which collections are ready to
be uploaded to the LDL is sent ad hoc via email and there is no current
mechanism for tracking the LDL files via the Accession database.
Institutions should establish a clear workflow and assign
responsibility for different aspects of the digitization process early on
in a project. This will alleviate any potential miscommunication or turf
war situation.
Before purchasing equipment and hiring staff to digitize audio, an
institution should assess the environment where they will be located.
At the Center the digitization station is equipped with the right
hardware and software, but its location is less than ideal. The Center is
located in an 80 year old house that is poorly insulated. The room in
which digitization takes place is in the center of the house next to the
building air handlers. The sound technician must use headphones
while optimizing audio. If the Center is relocated much thought will be
put into the location of the digitization station. In addition, Dr. Boyd
recommended the Center purchase two digitization stations. When
funding is available, the Center will explore this option.
A very important aspect of digitization efforts is providing access.
The Center works with the Digital Services Librarian to upload items
to the LDL, which uses CONTENTdm software. The software is less
than perfect in its treatment of audio files, and efforts to retrofit the
software to better serve audio are time consuming dependent on
LOUIS staff expertise. LOUIS controls server access so software
customization must go through them. The Center benefits from being
a part of this consortium environment where an infrastructure is in
place and support is offered at all times, but there are some
constraints that this relationship brings. Small or not well-funded
institutions interested in mounting collections online may be better
served entering into a partnership with a larger institution or
consortium.
84 Digitization and Access of Louisiana Oral Histories
An issue specific to the retrofitting of the software potentially
affects patron access and sustainability. The embedded audio player
that LOUIS retrofit for audio display does not display a time stamp so
patrons cannot skip to a specific section of the interview. The audio
player works well in the most current version of CONTENTdm, but
the software is scheduled to be upgraded soon. Changes may affect the
workflow and change the player functionality. The Center will rely on
LOUIS consortium staff to recreate the embedded player in the
upgraded software. Some institutions may not be able to expend a
great deal of staff time continually addressing the interface when the
software is upgraded, so this should be considered when addressing
the sustainable access points.
In many ways the process for adding audio collections to the LDL
has just begun. In 2008 all processed oral history collections which
had been on a cataloging backlog were cataloged in the Libraries
OPAC and WorldCat, which facilitates the metadata creation of
records in the LDL. Changes in the CONTENTdm software in the past
few years have made it more customizable. In 2009 the Center staff
began producing audio and video podcasts with images and sound
from the collections. The podcasts and information about hem are
available on the Center's blog (http://oralhistory.blogs.lib.lsu.edu/).
In order to maximize the amount of digitized materials that are
available online, key players should outline a digital access plan
wherein all materials that are currently ready for public display are
listed and other materials are prioritized.
Conclusion
The T. Harry Williams Center for Oral History began a digitization
program a little more than two years ago. Since then the Center has
acquired digitization hardware and software, hired a full time staff
member to perform digitization duties, and mounted several
collections to the Louisiana Digital Library. By all accounts, the
Center's efforts have been successful, although they hope to develop
more sound workflows for digital access to enable them to add
additional interviews to the online collections in the future.
Digitization and Access of Louisiana Oral Histories 85
Institutions wishing to emulate the Center should consult experts in
person or through the literature, follow industry standards set forth by
the Oral History Association (http://www.oralhistory.org), and,
formulate plans based on best practices such as the CDP Digital Audio
Working Group Digital Audio Best Practices
(http://www.bcr.org/dps/cdp/best/digital-audio-bp.pdf). It is
essential to plan ahead for storage space needs, keeping in mind that
what one thinks you'll need is probably less than the reality.
References
Bond, T. J. & Walpole, M. (2006). Streaming audio with synchronized
transcripts utilizing SMI., Library Hi Tech 24, 452-462.
Bond, T. J. (2004). Streaming audio from African-American oral
history collections. OCLC Systems & Services, 20, 15-23.
Casey, M. & Gordon, B. (2007). Sound directions: best practices for
audio preservation. Retrieved from: http://www.dlib.indiana.edu/
projects/sounddirections/papersPresent/sd_bp_07.pdf
Hurford, A. A. & Read, M. L. (2008). Bringing the voices of
communities together: the Middletown digital oral history project.
Indiana Libraries. 27, 26-29.
Integration and the black experience. (2003). Retrieved December 14,
2009 from http://www.louisianadigitallibrary.0rg/cdm4/
browse.php? CISOROOT=/IBE
T. Harry Williams Center for Oral History Exhibits and Presentations.
(2009). Retrieved December 14, 2009 from
http : / / www. lib .lsu.edu/special/ williams / ep .html
The Baton Rouge Bus Boycott 0/1953. A recaptured past (2004).
Retrieved December 14, 2009 from http://www.lib.lsu.edu/
special / exhibits /boycott / index.html
Weig, E., Terry, K. & Lybarger, K (2007). Large scale digitization of
oral history: A case study. D-Lib Magazine 13. Retrieved from:
http:/ /www. dlib.org/ dlib/ may07/weig/ 05weig.html
Digitizing a Newspaper Clippings
Collection: a Case Study and Framework
for Small-Scale Digital Projects
Maureen M. Knapp (John P. Ische Library, New Orleans)
Abstract
How does a small specialty library establish, develop and maintain in-
house digital collections? What are the considerations, challenges, and
benefits they experience? This chapter describes one library's
experience in turning an aging and inaccessible collection of
newspaper clippings into a preserved and searchable online collection,
which in turn laid a basis for other digital projects. This chapter also
discusses considerations, challenges and opportunities observed
during their first foray into creating a digital collection.
Keywords: Clippings, Digital libraries, Digital preservation, Digital
projects, Digitization, Electronic preservation, Newspaper clippings
file, Newspaper clippings, Press clippings.
Background
The John P Ische library is a mid-sized, urban, academic health
sciences library serving six schools of health professions at the LSU
Health Sciences Center (LSUHSC) in New Orleans, Louisiana.
Established in 1931, the library has collected newspaper clippings
related to the history and accomplishments of the health sciences
institution since its inception, and even today monitors the local
papers for pertinent news items. The "newspaper clippings file," as it
86
Digitizing a Newspaper Clippings Collection 87
came to be called, is an astounding 70 year snapshot of the
development of the health sciences in Louisiana. Over 6,000 clippings
trace development of LSUHSC through the twentieth century,
including such topics as: the people, places and events associated with
the LSU School of Medicine, the growth of health infrastructure in
Southeast Louisiana and New Orleans, and the development of 20th
century health sciences education in Louisiana.
Digital Collection Origins
In 2002, access and preservation concerns with some of the earliest
newspaper clippings encouraged the library to investigate digitization
as a possible solution. Access points to the collection were limited. The
only online access consisted of a locally-created subject database
containing basic citations to newspaper articles from 1985 to present.
Users had to search the local database by faculty name or department,
and then locate the physical newspaper clippings in filing cabinets by
call number. The remaining fifty-odd (1933-1984) years of the
collection was indexed in a card catalog, stored in the library's back
offices and only accessible to library staff.
Numerous problems plagued the physical collection. The
newspaper clippings had been stored in filing cabinets as they were
collected, which allowed the typing paper to curl heavily over the
course of many years. The newsprint itself showed signs of age: rust
marks appeared where staples and paperclips had once connected
pages, and gaps in the collection were apparent.
A lack of funding and staffing was another concern. Any efforts
towards creating a digital collection would have to be inexpensive and
make use of staff and resources the library already possessed.
However, to truly understand to physical condition of the
newspaper clippings file, and the challenges that would arise once
digitization began, one must understand the collection process of
gathering the original newspaper clippings. While no documentation
exists, the library postulates that even back to the 1930s, a library
member would skim the daily local papers from around Southeast
Louisiana for any mention of LSU School of Medicine, and its faculty,
88 Digitizing a Newspaper Clippings Collection
staff or students. Once an article was discovered, it was cut out of the
paper, dated, and the name of the paper was noted. The articles were
glued to standard 8 V2 by 11 inch typing paper, usually several to a
page, somewhat in order by date, and the paper was assigned a
numerical call number in the order they were received. Later someone
would read the articles, underline named entities pertaining to LSU,
and assign a subject heading, which was recorded in a small local card
catalog. Finally, the pages of clippings were organized into manila
folders by year and placed into filing cabinets until further needed.
This entire process continued for 50 years.
So basically, the library had a unique local news collection,
spanning the majority of the 20th century, collected and stored under
questionable archival methods, with limited access to documents
before 1985. In order to increase availability and use of the clippings,
the library wrote a grant proposal for a small-scale digitization project
to scan the newspaper clippings from 1933-1953, streamline
cataloguing, and offer public access to the resource online. The grant
proposed using Greenstone digital library software, an open source
"suite of software for building and distributing digital library
collections" (Greenstone digital library software, 2007), to provide
access to the digitized newspaper clippings.
Stops and Starts
Though the grant proposal was rejected, the grant writing process did
provide a catalyst for action within the library. The small grant
requested $3,000 to purchase a flat-bed scanner, computer and
optical character recognition software. Library administration was
impressed enough with the grant's digitization plan that they provided
funding for a scanner, software and travel to a continuing education
class on digital projects in 2003. A library staff member began
scanning the clippings. However, the library quickly ran into
problems. The Greenstone software would not work properly on their
secure intranet, and the library lacked a staff member with enough
computer programming experience to install and troubleshoot the
software properly. In addition, the image quality of the scanned
Digitizing a Newspaper Clippings Collection 89
newspaper clippings was poor, which was attributed to a faulty
scanner that did not produce dark enough images. Finally, copyright
concerns made library administration hesitant to post the collection
online to the general public.
By the time Hurricane Katrina struck New Orleans in August
2005, access, software and image quality issues had put the library's
newspaper clippings digitization project on hold. The library's
collection was undamaged from this natural disaster. However, it was
moved to remote storage for over half a year and the entire library
staff was displaced.
During the ensuing hiatus, library staff took several continuing
education classes on digitization. "Digitization Fundamentals," a
course offered by the Illinois Digitization Institute at the University of
Illinois Urbana-Champaign (University of Illinois Library, 2009), was
exceptionally useful, as it provided training in digital projects
management, standards and organization, as well as an introduction
to Photoshop software.
In 2007, an opportunity opened for the library to join the
Louisiana Digital library, the state digital library consortium provided
through LOUIS: The Louisiana library network (LOUIS: The
Louisiana library network, 2009). The library was able to obtain
access to OCLC's CONTENTdm platform, which was previously too
expensive, as well as the technical infrastructure and support needed
to store and access digital assets.
Consortial membership for digital library services addressed
many of the problems faced by the library developing an in-house
digital collection. The documentation on the technical and operational
requirements for participation in the LOUISiana Digital library
proved critical. The consortium's style manual for scanning and
cataloguing provided guidelines for selecting collections to digitize,
scanning practices, post-scanning image manipulation, project
workflows, metadata standards, and quality control. Another practical
advantage to consortial membership was LOUIS staff support, which
provided advice on imaging standards, basic training on the
90 Digitizing a Newspaper Clippings Collection
CONTENTdm software, and a shoulder to cry on when things went
awry.
The library began their second try at developing a digital version
of the newspaper clipping file in January 2008. As of December 2009,
the library has not only met their original goal of digitizing and
indexing over 1600 items in the collection from 1933-1953 (LSUHSC
New Orleans library, 2009), but also created several other collections.
Work Flow, Image Manipulation and Standards
The format and organization of the newspaper clippings collection
created a challenge in regards to digital manipulation and workflow.
In order to achieve indexing of items on an individual level, some
information that was included only once on a sheet of several
newspaper clippings (for example, the name of the newspaper, the
date, and most commonly, the clipping's call number) would have to
be added to each individual item. Thus, several steps beyond simple
scanning and image processing were included in the workflow.
Here are the workflow and standards for creating digital versions
of the Newspaper Clipping File:
1. Following consortium standards for creating digital images for the
Louisiana Digital library, the full-page newspaper clipping is
scanned on an HP Scanjet 8390 flatbed scanner to create an
archival black and white image at 300 dpi, 8-bit grayscale and
saved as an uncompressed TIFF file on the library server. This
creates an archival master version of the original digital image.
2. Using Photoshop, a copy of the archival master version is opened
and saved according to file naming conventions for the digital
library set forth by the consortium. This creates a duplicate of the
archival master that can be manipulated to isolate an individual
clipping. This file is the image that will eventually be loaded into
the digital collection.
3. The duplicate is cropped to isolate a single newspaper clipping.
Pages that have only one clipping on them are also manipulated
and cropped to minimize file size.
Digitizing a Newspaper Clippings Collection 91
4. If not visible, the call number, date and newspaper name from the
original scan are copied, cut and pasted to the now isolated
clipping.
5. Post capture processing is applied. The item is processed for
alignment and an unsharp mask filter is applied to correct blurring
that might have occurred during the scan process. In addition, the
image's histogram is viewed to adjust color intensity.
6. The individual, processed image of the individual newspaper
clipping is saved to the server.
7. For pages with more than one newspaper clipping, this process is
repeated until all clippings have been isolated.
8. After digital manipulation, the TIFF of the clipping is loaded into
the CONTENTdm Project Client. Cursory metadata is entered by a
library staff member. The file name, size and location are recorded
in a Scanning Log to track progress.
9. The librarian performs Optical Character Recognition (OCR) on
the clipping to create an excerpted text field and assigns subject
headings. OCR produces an abstract of the first 50 words of the
article, which is keyword searchable in the digital library. This
takes a bit of time, but it is a good way to review the article and
assign the proper subject heading. After a final quality check, the
item is approved and uploaded to the digital library. Upon upload,
CONTENTdm converts the full resolution TIFF file to JPEG,
which is what end-users access when viewing the collection online.
10. CONTENTdm also offers an Archival File Manager, which
automatically archives collections in a location specified on our
library server as they are uploaded to the online collection. Once a
volume is full, it is burnt to an archival quality CD recordable disc,
as well as saved on the server.
Cataloging and Metadata
The LOUIS consortium requires collections in the Louisiana Digital
library to use the Dublin Core 15 metadata element set (Dublin Core
Metadata Initiative, 2008), in addition to non-Dublin core structural
92 Digitizing a Newspaper Clippings Collection
and administrative metadata. CONTENTdm allows up to 125 fields
per collection. The library decided to add 3 more metadata fields to
the newspaper clippings collection: Call number (to locate the item in
the physical files), Full Text (for excerpted text) and Contact
Information (so users can contact the library). The following lists the
metadata fields used in the newspaper clipping collection.
Field Name (in
Type of
Metadata Content
Added by
Title
DC
Title of newspaper clipping
LS
Contact Information
A
Contact information for library
T
Creator
DC
Author of clipping
L
Contributors
DC
Contributor to clipping (rarely used)
L
Subject
DC
Institutional controlled vocabulary,
MeSH
L
Call Number
D
Call number for the original clipping
LS
Description
DC
"Newspaper clipping"
T
1
Notes
D
More descriptive information about
content of original clipping, if needed
L
Publisher
DC
Newspaper title
L
Date
DO
Date of publication
L
Type
1 exi
T
1
Format
DC
"TIFF"
T
Identifier
DC
Mandatory field directs users to
identifier URL
T
Source
DC
Library name and homepage URL
T
Language
DC
"En."
T
Relation
DC
URL to homepage of Newspaper
Clippings Collection
T
Coverage - Spatial
DC
"New Orleans (La.)"
T
Digitizing a Newspaper Clippings Collection 93
Field Name (in
Type of
Metadata Content
Added by
Coverage -
Temporal
DC
Year of publication
L
Rights
DC
Copyright information
T
Cataloger
D
Initials of librarian
L
Cataloged Date
D
Date of cataloging
L
Object File Name
D
File name of item
LS
Image Resolution
(Archival)
A
Dots-per-inch of scanned TIFF i.e.:
'300dpi"
T
Image Bit-Depth
(Archival)
A
"8-bit"
T
Color Mode
(Archival)
A
Grayscale
T
Extent (Archival)
A
Pixel dimensions of image
/l A /l A /l A /l A / 1 II II II 1 \
(WWWW:HHHH)
LS
Image Manipulation
(Archival)
A
"Crop, alignment, unsharp mask,
histogram"
T
File Size (Archival)
A
Size of TIFF image in KB
LS
Hardware / Software
(Archival)
A
HP Scanjet 8390, Photoshop, ABBYY
FineReader"
T
Digitized By
A
Initials of library staff member
LS
Digitized Date
A
Date of digitization
LS
Full Text
D
Abstracted content from OCR
L
List of metadata elements used in cataloging items. Meaning of symbols: A is
administrative; D is descriptive; DC is Dublin Core 15; LS is added by Library Staff, L
is added by Librarian, and T is added by Template.
Many of these fields are inserted automatically via a template in
CONTENTdm. The remaining fields are divided among project
members. The most tedious data entry was entering the Extent and
File Size fields for each item. Each clipping's dimension and size is
94 Digitizing a Newspaper Clippings Collection
different, so library staff tends to write these down on a notepad as
they scan images for entry, then record them in CONTENTdm and the
scanning log later.
Another feature of Content DM is the ability to build a customized
controlled vocabulary for the Subject field. This worked to the library's
advantage, as the newspaper clipping file possessed a card catalog of
subjects. The library uses the newspaper clippings card catalog as a
basis to build an institutional controlled vocabulary in the digital
library. The card catalog also serves as a reference point to verify
names and spellings of affiliated persons. This institutional controlled
vocabulary can be shared across digital collections, which is an
advantage for future projects related to our institution.
The library soon recognized that other subjects would be
necessary to adequately describe the digitized newspaper clippings.
Original cataloging varied so much over the years that clippings might
only include the name of the person or entity mentioned in the article.
The library wanted to add more descriptors, so that articles describing
conferences, publications, research grants or other common topics
were easier to locate. When applicable, the library consults the
National Library of Medicine's list of Medical subject headings
(MeSH)(U.S. National Library of Medicine, 2009) for appropriate
descriptors in the Subject field. For example, the MeSH term
"Congresses as Topic" is used when a clipping discusses conferences,
or the MeSH term "Publications" when a clipping mentions a new
book or journal article published by one of the institution's faculty.
Sometimes, MeSH is not useful, especially when discussing local
events such as campus expansion or departmental news. In these
cases, a subject heading is created and assigned by the librarian.
Clippings in the digital collection can be browsed by year, subject,
creator or title. Browsing by date is an interesting way to view the
development of institutional history. To further open the collection,
keyword searching is enabled in the excerpted text field.
Digitizing a Newspaper Clippings Collection 95
Project Considerations
Storage, standards, documentation, training and staffing were all
considerations for this project.
Storage was a huge concern. The deteriorating condition of older
newspaper clippings made it evident that storing the physical
newspaper clippings in filing cabinets was not conducive to
preservation. To address the curling paper, books were used to weight
down the paper for several weeks. This did not entirely fix the issue of
curling paper, but it did help a little in preparing the clippings for a
move to flat storage. After flattening, the files were transferred to acid-
free archival folders and placed in clamshell archival storage boxes.
Finally, the clamshell boxes of physical files were relocated to the
library's humidity controlled Rare Books Room, in order to protect
them from humidity and sunlight.
Likewise, the library was heedful of digital storage and the "digital
mortgage": how will the library address transfer of archival TIFF files
to new formats as software and hardware change? Though the library
has yet to encounter a change in image format standards, they did
attempt to prepare for this inevitability by storing the collection of
archival images in multiple locations, as well as on multiple formats.
Having multiple copies also addresses the possibility that some files
might eventually become corrupted. TIFF versions of the images are
burnt to an archive quality, professional grade CD recordable discs, as
well as copied to a location on the library server, which is maintained
by our institution and backed up daily to tape at a remote location.
This is in addition to the processed JPG file that is available to the
public on the Louisiana Digital library. A TIFF of the raw scan of the
original newspaper clipping is also retained on the library server.
With multiple storage locations and a complicated workflow,
documentation and staff training are also important concerns. The
library's consortial membership provided a style manual for scanning,
cataloging/metadata standards, and basic workflow suggestions. The
library used this as a basis for creating a local workflow policy, which
includes detailed directions on image scanning and manipulation as
well as step by step directions on how to process the item in
96 Digitizing a Newspaper Clippings Collection
CONTENTdm. A scanning log is used to track size and progress of a
collection. The scanning log is simply an Excel file which records the
file name, file size, and date of digitization, as well as locations to
which the file has been saved.
Regarding training, the library realized it was critical that
everyone involved with the project learn Photoshop. The LOUIS
consortium takes a 'train the trainer' approach to CONTENTdm, so
the librarian was responsible for training local staff on the software
after initial training.
This project is staffed with one librarian and two library staff
members, who devote about 10 hours a week to this project. Library
staff is requested to scan and process 60 clippings per week.
Scheduling issues quickly became apparent for the librarian project
manager, who has bibliographic instruction and reference desk duties
in addition to overseeing digital projects. A supervisor suggested
setting aside one day a week to solely devote to digital projects. Friday
has since become "Digitization Day" and has worked well in keeping
the load of items to be processed and approved by the librarian to a
reasonable amount.
Benefits and Challenges
One of the first challenges was software sustainability. The free
Greenstone digital library software did not work within the
institutional intranet and required higher level technical skills than
the library possessed. In addition, problems with the original project
scanner resulted in poor quality images that had to be redone.
Support from your institution from inception is critical.
Administration has to be on board to provide funding and act as a
liaison to other resources, for example, consulting with your
institution's legal department about copyright questions. Support
from information technology (IT) is also important. Getting our IT
department to provide support for open source library software was a
challenge that soon put the library's original plans to use Greenstone
digital library software on hiatus. One of the benefits of membership
in a state digital library consortium is that technical support is
Digitizing a Newspaper Clippings Collection 97
provided in an automated timely manner. In addition, the consortium
has direct contacts with the software developers at CONTENTdm, so
software concerns are quickly addressed.
The newspaper clippings collection is unique in that it collects
clippings from many regional news sources. All materials were
published after 1923. Therefore, the work may be protected by
copyright until 2018. Violation of copyright was a large concern, so the
library decided to restrict access to the images within the newspaper
clippings collection to the institutional IP address. In order to share
the collection with a larger audience, the collection's metadata is
searchable and viewable to anyone. This way, any user can find items
in the newspaper clippings collection, and if they are not from the
institution, the library works with them to get the information or
clippings they need.
Funding is a final challenge. Consortial membership to the digital
library is about $2000 a year, while hardware and software ran about
$1500 in startup costs. In addition, the library director donated a 21"
screen won at a library conference raffle for use with the digital
projects computer. Digital imaging is much easier with a larger screen.
Grants and scholarships are another source of funding. A scholarship
from a regional medical library group helped fund attendance at the
first continuing education class on digital imaging and metadata for
the librarian project manager. An recent Institute of Museum and
Library Services "Connecting to Collections" Bookshelf grant
(Institute of Museum and Library Services, 2009) allowed the library
to obtain a set of conservation resources and books, which was
previously non-existent.
The library now has over 10 years of institutional history available
online in a searchable database. Visibility and access to this collection
has increased. Indexing though OCLC allows results to appear in
Google. As a result, the library has received several inquiries about
subjects indexed in the newspaper clipping file from the United States
and Italy. The clippings file has also acted as a catalyst for change,
inspiring library staff to organize the rare books room, research
archival storage methods, and apply for grants. One of the benefits the
98 Digitizing a Newspaper Clippings Collection
library is proudest of is the mentoring opportunity this created. A
library staff member who helped start this project recently completed
their library degree and went on to become a Digital Initiatives
librarian at another local library.
The library has established a workflow and gained experience in
digital imaging and management for future projects. Because of the
success in creating the newspaper clippings collection, the LSUHSC
School of Dentistry started a digital collection of historic photographs.
In addition, the library worked with the LSUSHC Registrar's Office to
digitize graduation program records, which are now available in a
public, searchable collection. Finally, the library is in the planning
stages of creating a digitized version of early volumes of the medical
school student newspaper. The library also continues to add items to
the newspaper clippings collection.
As one can surmise, it has been a long 4 years to produce this
digital collection, but once the library established workflow and
standards it was much easier to begin other projects. Support from the
state library consortium certainly expedited and streamlined the
process, and the library recommends state or regional consortium
membership to any smaller institution considering developing a
digital project. For all the tedious data entry and malfunctioning
software, the creation of an enduring, searchable and accessible
source of institutional history made the entire project worthwhile.
References
DCMI Usage Board. (2008). DCMI type vocabulary. Retrieved
December 9, 2009, from http://dublincore.org/documents/dcmi-
type-vocabulary/
Dublin Core Metadata Initiative. (2008). Dublin core metadata
element set, version 1.1. Retrieved December 9, 2009, from
http:/ / dublincore.org/ documents/ dees/
Greenstone digital library software. (2007). Retrieved December 9,
2009, from http://www.greenstone.org/
Digitizing a Newspaper Clippings Collection 99
Institute of Museum and Library Services. (2009). Connecting to
collections: A call to action. Retrieved December 9, 2009, from
http://www.imls.gov/Collections/
LOUIS: The Louisiana library network. (2009). Retrieved December
9, 2009, from http://apploo3.lsu.edu/ocsweb/louishome.nsf/
LSUHSC New Orleans Library. (2009). LSUHSCNew Orleans
newspaper clippings collection homepage. Retrieved December 10,
2009, from http://www.louisianadigitallibrary.0rg/cdm4/
index_LSUHSC_NCC.php?CISOROOT=/LSUHSC_NCC
U.S. National Library of Medicine. (2009). Medical subject headings -
homepage. Retrieved December 9, 2009, from
http://www.nlm.nih.gov/mesh/meshhome.html
University of Illinois Library. (2009). Digital services and
development — training. Retrieved December 9, 2009, from
http : //images .library, uiuc. edu/pr oj ects/ newpr oj .htm
METRO Grant Success Story: Waterways
of New York Project
Claudia A. Perry and Thomas T. Surprenant
(Queens College, CUNY.)
Abstract
The concept of experiential learning is particularly useful when
students are required to create database entries as part of an ongoing,
real-life, online experience. A METRO grant resulted in an
opportunity to use students to create a CONTENTdm database which,
with the continued software support from METRO, has continued and
evolved until the present. This chapter describes the experience of
both faculty and students. Sections include the background, technical
issues and implications for teaching, project procedures and
workflow, successes and lessons learned, challenges and next steps.
Of particular interest is the use of out of copyright postcards and the
metadata that has resulted from intensive student study and
evaluation of the data contained on these cards. Those contemplating
a digitization project of their own will be able to learn much about best
practices, project planning, management and the
advantages/disadvantages of the CONTENTdm software.
Keyword: Best Practices, Canals, Case Studies, Cooperative
Learning, Digitization, Digital Collection Management Software,
Digital Collections, Digital Imaging, Experiential Learning, Library
Education, Metadata, Postcards, Project Based Learning, Project
Management, Project Planning, Quality Control, Standards, Student
Developed Materials, Student Participation, Student Projects,
Waterways.
100
Waterways of New York Project 101
Introduction
For many of us, hands-on learning is the best way to integrate an
understanding of principles and best practices with a practical grasp
of the actual challenges and learning opportunities of a project. This
is particularly true for library school graduate students seeking to
expand their theoretical, technical and management skills. As
digitization is increasingly seen as a worthy endeavor for even the
smallest institutions, it is worth considering the range of approaches
available for gaining needed expertise, especially at the novice level.
Examining the long-term development of an integrated, semester-
long, course-based approach to digitization may be of value for those
seeking an inexpensive approach for the creation of small to medium-
sized digital collections.
A course entitled "Introduction to Digital Imaging" was first
taught at the Queens College Graduate School of Library &
Information Studies (GSLIS), City University of New York (CUNY), in
the Fall of 2003. In the Spring of 2005, a year-long METRO-funded
grant facilitated a co-operative project between the Rosenthal Library
and the GSLIS to support student digitization of a portion of the
Queens College Rosenthal Library Archives (e.g. see GSLIS, 2005-
2009, Digitization projects). The project included a variety of forms
and formats. The evaluation of this valuable learning experience
identified a strong need to find a single standard format that was
information rich and moderate in scope, but which lent itself to more
uniform metadata standards and digital specifications. The evolving
project, "Waterways of New York", an online digital collection of
historical postcards, was created in 2006, and partially supported by
METRO through continued access to CONTENTdm. It continues to
be extended by GSLIS students each semester the course is taught.
Scope and Format
The most important feedback provided to our team by METRO
digitization experts regarding our "Rosenthal Library Archives"
initiative was the value of working with a limited number of
manageable formats and a relatively focused subject area and time
102 Waterways of New York Project
frame. During the implementation of the grant a serious problem was
the complexity resulting from too many different types of media, the
overly wide range of subject matter, and the challenges these
characteristics presented to the creation of consistent metadata.
One of the GSLIS professors, Thomas Surprenant, has an ever-
expanding collection of Erie Canal and related New York State
waterways antique postcards, which addressed many of the problems
noted in the METRO feedback. In particular, by selecting a single,
simple, information rich format— postcards published before 1923—
copyright concerns were eliminated and only a single set of
digitization specifications needed to be developed. METRO'S
willingness to host the collection on their CONTENTdm server
simplified selection of Dublin Core as the metadata standard, and use
of a subset of the Library of Congress Thesaurus of Graphic Materials
(TGM) for standardized metadata terminology (Library of Congress,
2007). This greatly aided our ability to develop a manageable set of
project-specific guidelines that could be adequately addressed by the
evolving documentation.
The choice of postcards as the source medium turned out to be far
more interesting to the students than was expected. An initial option
to describe the backs—as well as the front images of cards— was
enthusiastically embraced by virtually all of the students and became
the norm for subsequent classes. Hand-written messages, address
conventions, postmarks, trademarks and other attributes of the cards
were at times as much or even more rewarding to analyze than the
front images themselves. Further, student interest in the varied
aspects of architecture and activities of daily living portrayed in the
postcards led to an expansion of emphasis far beyond the initial focus
of the project on locks, canal boats, shipping, waterways and
transportation.
Background, Technical issues and
Implications for Teaching
Any planning for digitization requires a detailed analysis of one's
institution, and an assessment of where the proposed project fits into
Waterways of New York Project 103
its mission and priorities. Further, consideration of the potential
audience(s), project goals and objectives, resources and limitations,
oversight and long-term maintenance are among the many issues to
be addressed (e.g. see JISC Digital Media, 2008: Project management;
North Carolina Echo Project, 2007). These considerations inevitably
will shape the nature of the evolving project. It is important that an
honest appraisal be conducted, committed to writing, and approved
by the appropriate governing bodies. However, the nature of digital
projects ensures that adjustments inevitably will be required over
time. Changing standards, software and hardware upgrades, technical
glitches, and shifts in the growth of a project are just a few of the
issues which must be dealt with, often on very short notice. Planning
and documentation therefore should be viewed as an iterative process,
where ongoing evaluation is used to address and correct for changing
circumstances.
Creating a list of stakeholders and intimately involving them in
this planning process is critical to success. In our own case, student
feedback on procedures and emphasis has been an invaluable aspect
of the evolving project. Each incoming class section serves as a de
facto Advisory/ Editorial Board that contributes to the decision-
making process. These contributions include identification of
additional TGM terms for our thesaurus, the development of
standardized Trademark descriptions, fine-tuning of documentation
and lab handouts, and increasingly higher expectations for the quality
of the metadata. Within a more traditional library environment, all
members of the digitization team, as well as users and other staff
members, undoubtedly will have many valuable insights to contribute.
Among the key elements shaping the evolution of a project-based
digitization course at Queens College were the following:
Institutional characteristics
• When the initial course was developed it was necessary to have the
course proposal cleared with the departmental Curriculum
Committee after consultation with the Chair. This required the
development of course goals and objectives, specific readings, and
course assignments and activities.
104 Waterways of New York Project
• After three semesters teaching the course it was submitted to the
GSLIS and College Graduate Curriculum Committees, Faculty
Senate and, ultimately, the CUNY Board of Trustees for approval
as a permanent course.
• An understanding of the possible pitfalls of the process at every
step was important to ensure that all potential hurdles were
considered and cleared.
• Even outside of explicitly academic environments, proper
attention to obtaining documented approvals and support from
key stakeholders—at all levels up to the governing board— will
prove invaluable in avoiding challenges and ensuring continued
buy-in by the institution and other funding agencies.
Lab facilities (capabilities and challenges)
• For our project we were able to use a 16 workstation Mac lab with
direct connections to the Internet. The lab had been expressly
designed by GSLIS faculty for digitization-related activities and
hands-on learning, in close collaboration with the Queens College
Office of Converging Technologies (OCT) and college architectural
staff, in conjunction with the development of the course proposal.
Appropriate institutional commitment to fund, support and
regularly upgrade such a lab was, and is, essential to the
continuing success of the project.
• Specifications included an instructor workstation (in addition to
student Macs), ceiling mounted projector and wall screen for
demonstrations, two (eventually three) flatbed scanners,
SilverFast AI scanning software, Photoshop, and the Microsoft
Office Suite, particularly Excel.
• A major continuing challenge concerns computer and software
upgrades. The OCT staff do not always consider the rhythms of
the academic year in making changes to the lab, which regularly
causes problems, even after many years of teaching the course.
For example, in the Fall 2009 semester alone, new computers
were installed during the first week of the semester. This resulted
in equipment and software glitches, and a delay in the availability
Waterways of New York Project 105
of the lab, as well as the need to test software functionality and
then revise/upgrade lab handouts with minimal advance notice.
• While we were grateful for the new equipment (a regular
replacement cycle is essential for ongoing functionality), timing
issues resulted in a rough start to the semester.
• A major equipment problem for us was solved when Apple
changed to an Intel CPU. The new Mac computers are now dual
boot (Apple and PC Operating Systems), meaning that they can
now run CONTENTdm (CDM) using the Project Client software
interface. Previously, lab sessions had to be specially scheduled in
a nearby PC lab (CDM Project Client software is not available for
the Mac OS). However, dual boot capabilities have created
additional problems of compatibility, accessibility and ongoing
troubleshooting.
Software
• As noted above, new equipment means software installation and
the attendant complications. The specialized nature of our lab, and
lack of teaching assistants, necessitates that course faculty test all
functionalities and work with OCT staff to address problems.
Oftentimes this has meant repeated testing and troubleshooting,
frequently a day or two prior to a scheduled class. Such technical
malfunctions can wreak havoc on the best-planned teaching
schedule.
• While CDM has been sufficient for our needs, and we are
extremely grateful to METRO for their continuing support, there
are still some issues that cause concern. The biggest is that
students cannot directly upload their input into the database due
to the administrative rights structure. This situation requires
another level of review by the course instructors serving as
database administrators/quality control experts, adding
substantially to time demands near the end of the semester. In
addition, after students submit their data entries for approval,
editing ability on their side is extremely constrained, by both time
and software limitations.
106 Waterways of New York Project
• More recent upgrades appear to have adjusted this limitation,
permitting downloading of materials from the live database for
additional editing if errors are detected. However, this creates
additional levels of oversight and complexity, and assumes that
the instructors will be able to approve the uploads in time for the
students to review and make changes. This is simply not readily
accommodated within a 15 week course schedule.
• Further, although recent versions have been more stable, in the
past CONTENTdm has crashed frequently, causing much
frustration on the part of both students and faculty.
• These points emphasize the steep, and ongoing, learning curve of
digital project-based courses for faculty, support staff and
students.
Support
• Adequate and timely support for equipment and software is
essential to any technology-based project. The GSLIS has a
number of student computer assistants and a campus-wide Help
Desk, but as noted, the specialized nature of our lab sometimes
puts it outside of the realm of their expertise.
• It is good practice to fully document and save ALL help desk
requests and related support communication. These include
emails, screen shots and help desk tickets. These records of
ongoing and recurring problems have proved to be invaluable in
our efforts to ensure follow-through, and to support our case when
requests have not been fully resolved to our satisfaction or when
problems repeat themselves.
• CONTENTdm Help seems to work best when we go through
METRO. That means that an additional layer of contact needs to
be activated anytime is a problem. That said, all relevant staff at
METRO, over the years of this project, have been incredibly
knowledgeable, supportive and responsive to our needs.
Waterways of New York Project 107
Staffing/oversight
• Experience suggests that having a subject expert for image content
is a critical factor. The faculty have, or have developed over time,
sufficient expertise to assist students in their metadata and
description activities.
• Given the need to protect the postcards and the equipment the lab
has to be under supervision whenever anyone is working. This
greatly adds to the time burden of both faculty and staff.
• Postcards are stored in archival quality sleeves and students use
white gloves when handling the postcards while scanning.
Class size and student characteristics
• The class size is dictated by two elements: the number of work
stations and the work volume. Experience suggests that all
students need to have access to their own workstations, and two
workstations are dedicated to scanning use (a third must be
shared between functions). The initial classes scanned, created
metadata and submitted for approval six cards (front and back),
but many of these initial canal cards were fairly simple rural
scenes. Given the amount of detail that has emerged in postcards
in later semesters, we have gradually reduced input to three cards
(front and back), because the quality and quantity of the metadata
has increased substantially. The time spent in quality control by
instructors has increased commeasurably, despite having students
doing quality control on their partners' work.
• Those involved in the digitization process are best served with at
least an intermediate level of computer and software expertise.
We have constructed our teaching labs in a step-by-step fashion,
and utilize in-class time extensively. This allows the faculty to
introduce and demonstrate skills and to detail the various steps of
the project.
• In our experience, students who are highly competent in
computers and/or relevant software or metadata creation have
been more than eager to assist their classmates. This leads to a
io8 Waterways of New York Project
highly supportive class environment in which all learn from one
another, modeling (one hopes) the ideal workplace environment.
• However, a fair number of students have no previous familiarity
with the Mac OS, Photoshop, scanning and related software, which
complicates the pace at which the class can proceed. The nature
of our curriculum and scheduling constraints make it difficult to
require pre-requisites beyond the required core courses.
Consequently, a teaching assistant to help in quality control, and
in the provision of additional technical support in lab sessions,
would be extremely desirable for all.
Evolving nature of the target collection
• Initially, the Waterways Post Card Collection consisted principally
of cards of the Erie Canal, with collections of additional New York
State canals such as the Oswego, Seneca, and Champlain Canals.
As demand for the course has remained steady, indeed increased,
the diminishing availability of canal-related cards posed a
potential problem. (Most cards are currently obtained on eBay,
and changing availability in geographic scope is an interesting
topic for another paper.) On the other hand, many of those canal-
related cards that have become available are increasingly
distinctive.
• With the Quadricentennial Celebration of the discovery of the
Hudson River approaching in Fall 2009, it appeared to be a logical
extension of our collecting scope to extend to another key New
York State waterway, the Hudson River. We included in our
selection criteria cards depicting New York Harbor and the East
River. The first such Hudson River cards were digitized in Fall
2008.
• The expansion in scope of the cards created fascinating but
unanticipated challenges. Publishers, trademarks, and the
increasing complexity of the images depicted required a
substantial expansion of the TGM thesaurus, as well as the
development of descriptions of an increasingly diverse set of
trademarks, logos, stamp boxes and postmarks.
Waterways of New York Proj ect 109
Consistency and accuracy
• Digitizing any collection over a period of time by a changing group
of participants creates somewhat greater consistency and accuracy
problems than might apply in a short-term. In spite of the
iterative editing of documentation, inconsistencies and errors are
regularly emerging in our project. To a large degree, this is due to
the pressures of a fast-paced curriculum, a constantly changing
panoply of operating systems, software versions, additions to our
thesaurus and the ongoing, changing nature of the cards within
our purview..
• Our experience has shown us that the students themselves are the
best editors in catching errors and inconsistencies. It is obvious
that the road to "perfect" metadata, documentation, labs and
handouts is continuous, difficult, and perhaps, ultimately elusive.
Such is the nature of a work in progress.
• It helps to have students who have a keen eye for detail as well.
The project was significantly enhanced when then-student Susan
Savage completed an Independent Study project in Spring 2007,
that corrected many of our past mistakes, and developed the
scaffolding of our current metadata documentation. It is now
obvious that outside help in editing is an important part of the
process (although not easily achieved).
Key readings and course activities and relevance for the project
• Clearly, carefully selected readings are critical to the success of a
digitization project. We provide access to a range of resources in
an effort to meet the needs of those at varying levels of familiarity
with digitization and related issues. Alumni feedback has
suggested the importance to many of providing continued access
to the resources once our graduates are working in the field. Once
involved in a real world project, many become even more aware of
the importance of items that may not have seemed salient at the
time of the course.
• Students complete a "Tech Review Exercise" in week seven, to
document their understanding of key technical concepts. This
no Waterways of New York Project
reduces the need to cover many basic concepts during lectures,
and to focus on practical development of skills and discussion.
• In addition, students prepare a detailed case study of a well-
documented digital collection, and present an overview of their
sites to class members. Examining the successes and challenges of
a substantive collection reinforces the concepts and principles
learned and applied throughout the course. Students comment
with great pride on the degree of detail of the metadata they have
developed for their cards, as compared to some of the case studies
examined, including many well-funded projects. For many
respondents to our end-of-semester evaluation surveys, the case
study is perceived as a high point of the course.
Project Procedures and Workflow
Assignment of cards to students
• Postcards are carefully selected in order to give each student a
maximum of exposure to bibliographic richness, varying formats
and levels of difficulty. For instance, at least one card will have
writing on it for transcription. Cards also are selected on the basis
of color or black & white, as well as postcard era (e.g. divided back;
undivided back). In addition, postmarks, publishing information,
trademarks and stamps are considered in allocating cards. A
lecture on the history of postcards provides background in
understanding the evolution of these standards (e.g. see
Smithsonian Institution, n.d.).
• Once selected, each card is given an eight alphanumeric accession
number that identifies the subcollection, card number (a total of
9999 entries are available) in that series, the type of image (access
or archival), and whether or not it is a front or back view. Early on
in the project it was realized that having a related but unique
identifier for each side was vital to having both sides of a card
displayed together. At this point the Waterways series identifiers
are: Champlain (c), Erie (e), Delaware and Hudson (d), Hudson
River (h), and Seneca (s). Thus, a card in the Hudson River series
Waterways of New York Project 111
available on the Web will have two accession numbers - one for
the front and one for the back: e.g. hoo62aci, hoo62ac2. Slightly
different filing naming schemes are employed for the master
archival files (e.g. h0062ar1.tif), so that the nature of the file
format is evident even without the file extension.
• We adhere to the ISO 9660 8.3 alphanumeric naming standard to
ensure that our files will be compatible across platforms (JISC,
2008, Choosing a file name).
• Stickers with the student names and the accession numbers are
printed and attached to the outside of the archival sleeves that
contain the assigned postcards. The sleeves are put into a binder
so that students can access them, and photocopies of fronts and
backs created to ensure a record of assignments.
Scanning and Creating Derivatives
• Once the postcards are distributed to students, they engage in
hands-on instructional labs in scanning. Students use SilverFast
AI software, in conjunction with the latest version of Photoshop,
using the currently available flatbed scanners (these are regularly
updated) to create uncompressed Tagged Image File Format
(TIFF) scans. The resulting archival files, in the 20Mb range, are
stored (and backed up) off line and can be examined by students if
extensive detail is needed. These files are the heart of any
database because they can be used for non-web purposes
• Using Photoshop, students then create JPEG derivatives at
medium compression with a resolution of 150 ppi with 1000 pixels
on the long dimension (following the CDL Guidelines for Digital
Images: California Digital Library, 2009). The "ar" on the archival
files are changed to "ac" on the accession numbers to reflect the
change in file size and specifications. In this process the archival
masters are reduced in size to approximately 200 KB, appropriate
for web viewing, and yet still large enough to zoom for some
degree of detail.
112 Waterways of New York Project
Assigning Metadata
• The heart of the course is when students assign metadata. A group
exercise using different sample postcards for each pair of students,
which is then discussed within the class, provides basic experience
in assigning subject terms. What is initially thought of as a simple
process quickly becomes complex when the students are
confronted with the reality of their individual cards.
• Each student is given a thesaurus, based primarily on the
Thesaurus of Graphic Materials (TGM), that contains all of the
subject terms to be used; an electronic versions also can be used to
copy/paste the terms. In addition, detailed handouts with
trademark, stamp data, and metadata procedural guidelines are
made available in print and digital form. Metadata and JPEG
images will be inserted into an Excel template, which lists all of
the metadata requirements according to CDM fields (tailored to
our project), and the corresponding Dublin Core fields.
• As needed, additional terms from the TGM are added to our
project thesaurus to reflect characteristics not previously
encountered. For example, the move to using cards depicting New
York Harbor required the addition of such terms as skyscrapers,
aquariums and ferries.
• Each week, short lectures introduce the students to canal history,
lore and terminology. Once the basics are covered students are
then given time to insert JPEGS of the fronts and backs o f their
first card (the simplest) into an Excel template. They then proceed
to select subject terms, develop descriptions, and identify key
Dublin Core fields (e.g. Title, Creator, etc.).
• The faculty provides assistance and guidance during this process,
and students are encouraged to work with a partner. Initially
there is a great deal of trepidation, but as students gain experience
and confidence less faculty attention becomes necessary.
Waterways of New York Project 113
Quality control teams
• In order to provide more experience to the students, and to serve
as a double check before the metadata is passed on to the faculty,
they are paired up with a quality control partner, and required to
exchange their Excel metadata files. Each student then reviews
his/her partner's metadata, starting with the initial card, and
makes any necessary editorial changes and/or suggestions for
improvement using a different colored font. The reviewed
metadata set is then forwarded to the faculty for their comments
and edits, added in yet a different color font. This provides an
iterative record of changes: the input of initial author and control
partner, and final corrections by faculty. After receipt of faculty
feedback, the teams can then proceed with subsequent cards.
• This staggered approach is recommended because students tend
to make the most errors in their first attempts, and to learn from
the ongoing feedback. Additional benefits to the project include
the identification of new trademarks in the initial set of cards,
descriptions of which can then be shared with others on
subsequent cards. Lastly, repeated mistakes may reveal
unintended errors or inconsistencies in documentation that can
then be revised to reduce future errors.
• While this would seem to be a straightforward process it has
turned out to be much more challenging than first envisioned.
Yes, the students get, essentially, experience with six, not three
cards which is good. But a number of other problems have
resulted. It is almost always the case that the better students pair
up and that they turn in metadata that needs little comment.
However, it still takes an average of fifteen minutes per relatively
error-free file for faculty to review submissions. Weaker students
require much more faculty review time either because the editing
process is sloppy or submissions are late. In some cases faculty
review and editing can take as much as an hour per Excel file.
Given time constraints and course deadlines this can get extremely
stressful to all parties.
114 Waterways of New York Project
• That said, it continues to surprise the faculty that: l) the quality of
most metadata submissions is so impressive, and that 2) there
remain previously unidentified errors in what seems to be a fairly
strict process. In this regard the quality control process is working
as envisioned.
Below is a transcribed example of the metadata for "1609 • HUDSON-FULTON
CELEBRATION • 1909 [front caption] (Ifront) [h0189ac1]" after it has been uploaded
to CDM.
Title: 1609 HUDSON-FULTON CELEBRATION 1909 [front caption] (Ifront)
[h0189ad)
Creator: Copyright 1909 J. Koehler, N.Y. [indicated on front only]
Subject — Front: Cliffs, Clouds, Flags, Grasses, Portraits, Rocks, Ropes,
Schedules (Time plans), Ship equipment & rigging, Shrubs, Smoke,
Smokestacks, Steam engines, Trees, Men, Passengers, People, Color
postcards, Sailing ships, Side wheelers, Aerial views , Rivers.
Description-Front: A commemorative postcard celebrating the 300 th anniversary
of the discovery of the Hudson River, with portraits of Henry Hudson and
Robert Fulton superimposed over a daytime aerial view of the Hudson River.
Prominently featured are a sailing ship (circa 1609) and steamship (circa
1809) [presumably the Claremont] which together serve to commemorate the
passage of time from discovery to the modern day. Soaring cliffs line the far
bank and along the near bank; at right, there is a gathering of people (perhaps
Native Americans). An information box titled 1609 HUDSON-FULTON
CELEBRATION 1909, lists the following 15 events: Sept. 25 Commencement
Day N.Y., Sept. 26 Religious Observance Day NY., Sept. 27 Reception Day
N.Y., Sept. 28 Historical Parade NY., Sept. 29 Commemoration Day NY.,
Sept. 30 Military Parade Day NY., Oct 1 Naval Parade N.Y., Oct 2 Naval
Carnival Parade N.Y., Oct 3. Religious Day Upper Hudson, Oct. 4 Dutchess
Co. Day, Oct. 5, Ulster Co. Day, Oct. 6 Green Co. Day, Oct. 7 Columbia Co.
Day, Oct. 8 Albany Co. Day, Oct. 9 Rensselaer Co. Day. COPYRIGHT 1909
BY J. KOEHLER, N.Y. [indicated on front only].
Coverage - Geographic: Hudson River, New York and New Jersey
Date Original: 1909?
Publisher: Graduate School of Library and Information Studies - Queens College
(CUNY), New York, New York
Language: eng
Source Height: 3.5"
Source Width: 5.5"
Source: Waterways Post Card Collection of Thomas T. Surprenant: Hudson River
Type: Text; Image
Waterways of New York Project 115
Digitization Specifications: Archive masters were scanned at 600 ppi, 24 bit
color using a Canon CanoScan 88800F and SilverFast Ai scanning software
and saved as uncompressed TIFFs using Photoshop CS$ and Mac OC 10.5,
Derivative access files were saved as medium quality JPEGs, 150 dpi, 1000
pixels on the long dimension. Thumbnails were generated automatically by
CONTENT dm.
Date Digital: 2009-09-30
Resource Identifier: h0189ac1.jpg
Format: Image
Format [Medium]: Postcards
File Size: 218972 Bytes
Checksum: 287714794
Height (of Digital Image): 1000
Width (of Digital Image): 637
Color Space: sRGB
Rights Management: Contact Thomas T. Surprenant, ebeltoms@nyc.rr.com
Contributing Organization: Graduate School of Library and Information Studies,
Queens College (CUNY), New York, New York
Digitization Team: GSLIS 757, Digital Imaging Fall 2009,: Beth Daniel Lindsay,
Meg Donabedian
(GSLIS, 2006-2009, Waterways of New York V. 2. 1).
Uploading to CONTENTdm
• By mid-semester, the students are introduced to the basics of the
CDM Project Client software in a hands-on lab. They upload a
sample image, are guided through the creation of a "Waterways of
New York" banner to be automatically inserted for future uploads,
and— following step-by-step directions— use the Template Creator
function to input recurring fields (e.g. Source, Digitization
Specifications, Digitization Team), and to enable CDM's capacity
to automatically capture technical metadata such as Resource
Identifier, File Size, Format and Checksum fields.
• They then upload another sample image to demonstrate that these
functions are working, and to doublecheck (and correct for) for
possible errors in data entry.
n6 Waterways of New York Project
• The following week (assuming appropriate progress on early steps
in the workflow), they are ready to upload their first postcard
JPEG images (front and back, entered as individual files). Early on
in the project we decided against uploading the fronts and backs of
cards as compound objects, since this would severely limit the
amount of detailed metadata that could be made available.
• Students "cut-and-paste" the doubly reviewed metadata from their
Excel files into the appropriate fields in CDM. This approach was
especially important when we had limited access to the Project
Client interface in the adjacent PC lab, and at times of instability
in CDM at the client end.
• Following an additional review by faculty for completeness and
accuracy at the desktop, students then upload the files to be
approved in the Administrative interface.
• It provides tremendous positive feedback to all to see the fruits of
many weeks of work appear live in the evolving database, in real
time, during this most momentous class session.
• In subsequent weeks, a portion of each class is devoted to
additional project work, reflecting whatever particular step each
student is addressing.
Timing /scheduling issues
• Given the pressure of a 15 week semester, and the need to respond
quickly to student submissions, the entire quality control process
takes much more time than initially projected. Yet, there seems to
be no alternative to preventing major errors from creeping into the
database. And students regularly report, in mid-year and year-end
evaluations, that they find the hands-on nature of metadata
creation and feedback to be among the best aspects of the course.
• The time constraints on classes (once a week for 2 hours 35
minutes) make it difficult to teach what is needed and to allow
students the time to practice. This is particularly true for
commuting students, most of whom hold full-time jobs and may
come to school only once a week. This is complicated by
Waterways of New York Project 117
extremely time-consuming commutes (for students living in
Brooklyn, three or more hours round trip is common).
• It is not unusual for a few students to still be attempting to scan
(or rescan) their postcards during the seventh or eighth week of
class (or to have to redo their derivative files due to lack of
attention to digital specifications). This puts the faculty— and
student partners—into a real bind when it comes their turn to
evaluate student work.
How much detail is enough/too much?
• This question has vexed both the faculty and students since the
inception of the course. One of the best aspects of the project is
the information rich content of the postcards, both back and front.
Students have been particularly diligent in finding, and fighting
for, new subject terms and descriptive material. This makes the
course an ever evolving environment with an endless series of new
things to consider and debate.
• Especially dedicated students have taken the time to research
historical events depicted in their cards, leading us to add a
"References" field in Fall 2008, where they can add more
background and even Web citations.
• Student interests, skills, and insights have made this very much a
"student- driven" course, in the best sense of the phrase. Every
class has made unique contributions to the quality of the project,
and all (both faculty and students) learn from one another. The
collective contributions of the myriad participants in this project
cannot be understated.
Supervisory issues
• As previously noted, the class inevitably encounters difficulties
with students who either are forging ahead or are lagging behind
others. In any given class the faculty can expect to see a certain
level of frustration as students cope with their individual abilities
and challenges.
n8 Waterways of New York Project
• The evolving documentation, hardware and software problems
sometimes create an illusion that the faculty are not well prepared.
It doesn't take much to derail what has been planned for any given
class. This adds to the timing and scheduling issues discussed
above. On the other hand, as more perceptive students note, this
is the way projects function in the real world, and provides
insights they can share in future job interviews of how they dealt
with unforeseen complications.
Role of independent study students in quality control review
and overhaul of documentation
• To date, there have been only two Independent Study students
interested in assisting with quality control, one for the METRO-
funded archival project (Amy Armstrong), and one for the
Waterways project (Susan Savage). They both made extraordinary
contributions to the projects in terms of documentation and
quality control. There is no doubt that, in the near future, more
students and/ or faculty will have to be brought into the process of
oversight and review in order for existing problems and errors to
be addressed, but there are substantial challenges in envisioning
how this may be possible in the current economic climate.
• The same difficulties exist with the revision of documentation.
Incremental changes have resulted in some inconsistencies, and
outright errors, in the documentation. At this point having better
documentation has a higher priority than quality control review of
the existing database. Better documentation will result in better
quality control. During the coming year the faculty intend to make
major revisions of the documentation to ensure that it is not only
up-to-date but more accurate.
Expansion of the database
• We anticipate that, sometime in the near future, the database will
have to expand beyond a personal collection. Two preliminary
moves in that direction have already been undertaken. The
Ellenville Public Library and Museum permitted its Delaware and
Hudson (D&H) postcard collection of 70+ cards to be scanned and
Waterways of New York Project 119
inserted into a separate Greenstone database. Similarly, the Erie
Canal Museum in Syracuse permitted us to create archival Tiffs of
55 postcards. Students then created derivatives and assigned
metadata on these cards, in addition to several they scanned
themselves.
• From experience it is, indeed, gratifying to see that a spirit of
participation exists within many of the various libraries, museums
and historical societies. More needs to be accomplished in this
arena and there are plans to work with additional organizations in
the coming year.
• Once the collection becomes more mature, enlisting other
postcard collectors to contribute to the database is a strong
possibility.
• Any future expansion will have to consider moving beyond
postcards into other forms and formats. While this is a logical
extension, especially with photographs, maps and important
historical documentation, the level of difficulty increases. In order
to keep students in the process it is likely that the faculty will have
to do a great deal of traveling to digitize collections that are all
over the state.
Successes and Lessons Learned
The literature of "Best Practices" emphasizes the importance of an
"Advisory Council", and the involvement of key stakeholders, to
provide feedback and insight into the successes and challenges of an
evolving project (e.g. Chapman, 2000). Our students have exceeded
every expectation in this regard, and reinforce the importance of input
from a variety of stakeholders in the advisory process. Subject
headings and description have expanded from an initial focus on
canal-specific characteristics (e.g. tugboats, locks), to many attributes
of the environment and daily life in early 20th century New York.
Detailed rules and guidelines provide standardized descriptions
(developed by the students) of trademarks, logos and postmarks and
ensure consistency with AACR2 and other standards.
120 Waterways of New York Project
Perhaps most importantly, students have gained an appreciation
for every aspect of the overall planning, processes and implementation
pertaining to the creation of at least one type of digital collection
(images). This practical experience is embedded in an exposure to key
readings in the literature, class discussions, and a detailed case study
analysis of an existing collection. In particular, they come to
understand the enormous demands of metadata creation and quality
control, and the tradeoffs that may lead digital collection managers to
minimize this aspect of their projects. The role of students as key
project participants has been extremely valuable in providing concrete
evidence of their experience as they search for jobs in this difficult
economy. It has been equally valuable for potential employers seeking
qualified job candidates.
At this writing, the project continues, with two course sections
scheduled for Spring 2010 and no end in sight. Our students have
gone on to excellent positions in the field and the feedback from
intern supervisors is consistently positive. With any luck, our
collection will exceed 1,300 images by early 2010.
Challenges and next steps
In addition to many of the future plans outlined above, we are
seriously considering migration of the database to an open-source
platform, such as Omeka (http://omeka.org/), Greenstone
(http://www.greenstone.org/ ) or other available options. Running
such a digital library collection platform in parallel with CDM would
present a valuable learning tool for students, while adding to the
complexity of the project.
Possible student interest in working on this initiative as an
independent study project, coupled with a soon-to-be-completed
search for a new lecturer line at the GSLIS, with specializations in
digitization and related technologies, provides some basis for
optimism as we move into the next decade. And we remain
committed to this labor-of-love that has inspired and empowered so
many classes of students.
Waterways of New York Project 121
Yet on the challenges side, for faculty it remains a difficult
balancing act. The nature of a project such as this requires continuous
updating and editing to ensure consistency and quality in the ongoing
development of the project. This process places serious stresses on
the competing obligations of supervising faculty for research,
publication, service, staying current, and responsibilities for other
courses. This is particularly true during a major period of transition
within our department. But, some things are very much worth doing,
and we hope this example may provide encouragement to others
starting out, or expanding their existing initiatives. We look forward
to extending the dialogue!
References
California Digital Library. (2009, September). "3.5 Guidelines for
access and thumbnail image files." CDL Guidelines for Digital
Images, Version 2.0. Retrieved December 31, 2009, from
http : / / www.cdlib. org/ inside/ diglib / guidelines/bpgimages /
Chapman, S. (2000). Section III: Considerations for project
management. In M.K. Sitts (Ed.) Handbook for digital projects: A
management tool for preservation and access. Retrieved
December 30, 2009, from
http : / / www. nedcc.org/resources/pubs. php
Graduate School of Library and Information Studies. Queens College,
CUNY. (2005-2009). Digitization projects. Retrieved January 3,
2010, from http://qcpages.qc.edu/GSLIS/digitization.html
Graduate School of Library and Information Studies. Queens College,
CUNY. (2006-2009). Waterways of New York V. 2.1. Retrieved
January 3, 2009, from http://cdmi2840i.cdmhost.
com/cdm4/browse.php?CISOROOT=%2Fqcgslis_fo6
JISC Digital Media. (2008, November 11). Choosing a file name.
Retrieved September 2, 2009 from http://www.jiscdigitalmedia.
ac.uk/ crossmedia/ advice/ choosing-a-file-name/
JISC Digital Media (2008, November 14). Project management for a
digitisation project. Retrieved December 31, 2009, from
122 Waterways of New York Project
http://www.jiscdigitalmedia.ac.uk/crossmedia/advice/project-
management-for-a-digitisation-project/
Library of Congress. (2007). Thesaurus for graphic materials I.
Retrieved December 31, 2009, from
http: / /www.loc.gov/ rr/ print/ tgmi/
North Carolina ECHO Project. (2007). Project planning .
Digitization guidelines, 2007 revised edition. Retrieved January 2,
2010, from http://www.ncecho.org/dig/guide_iplanning.shtml
Smithsonian Institution, (n.d.). Greetings from the Smithsonian: A
postcard history of the Smithsonion Institution. Retrieved
December 31, 2009, from http://siarchives.si.edu/history/
exhibits/ postcard/ chronology.htm
Part II - A Diverse Digital
Landscape: Digital Collections
in Public Libraries, Museums,
Cultural Heritage Institutions,
and Knowledge-Based
Organizations
Managing Rights in a Medium Scale Audio
Digitization Project
Barbara Taranto and Elizabeth Bradley
(New York Public Library)
Abstract
In 2007 New York Public Library received a small grant to digitize a
selection of audio recordings of public programs, including lectures,
interviews, and panel discussions that took place at the Humanities
and Social Sciences Library between 1983 and 2003. This chapter
briefly discusses the scope of the project and the institutional reasons
for the initiative. The chapter then discusses some of the significant
challenges regarding Intellectual Property Rights, faced by the Library
and the strategies it employed to deal with the issues. The chapter
concludes with a discussion of the policies and procedures the Library
has since put in place to manage audio rights.
Keywords: Access, Audio, Copyright, Media, NYPL, Privacy,
Public library, Recording, Taping, Transcription.
The Genesis of the Project
New York Public Library has been engaged in digital projects since the
mid 1990's when it launched its first public website. In 1998 it
published a compilation of digital "reprints" African American
Women Writers of the i9thCentury (Webpage Dodson, 1998) of out of
print, and essentially unavailable 19th century materials from the
holdings of the Schomburg Center For Research In Black Culture. By
2000 the Library had established a new unit, the Digital Library
125
126 Managing Rights in a Medium Scale Audio Digitization Project
Program, to manage the creation and publication of digitized content
from its collections. Within the year an even larger project - the
digitization of half a million pictorial items - NYPL Digital Gallery
(see New York Public Library Digital Gallery, n.d.) - was undertaken
and by 2003 the Library was producing approximately 1250 new
image files - complete with metadata - per week.
Public domain items were chosen for the first digitization efforts.
This allowed the Library to leverage existing bibliographic records for
metadata creation without additional research into intellectual
property rights. It also allowed the Library to provide a new and
exciting online resource for the public.
As the program matured policies and best practices were
developed, including how to manage the rare but occasional issue of
Intellectual Property (IP). The metadata system was modified to
record Intellectual Property rights and access permissions. Processes
were established to redact items that were mistakenly digitized and/ or
published before IP permissions had been received, and new content
was being created daily. At the same time the Library began
experimenting with new formats such as geospatial data and media
files. It also began to look closely at user needs and at new avenues for
distribution such as mobile and cellular applications.
As patrons became more experienced web users and their home
computing technology improved, user expectations for easy and quick
access to rich content increased substantially. Public domain
materials were no longer sufficient. Consequently in 2007, the Library
decided to initiate a pilot project to determine what was involved in
managing rights-encumbered collections in the digital environment. It
requested and received a grant from the Metropolitan Library
Association to digitize recordings of well-known public personalities
and ultimately make them available to the public via the web under
the title 24 Hours at NYPL.
The Scope of the Project
We wanted to test the waters for audio rights management at an
institution with a mandate for accessible content delivered at no
Managing Rights in a Medium Scale Audio Digitization Project 127
charge. There were three key goals: 1) discover the issues involved in
rights encumbered audio; 2) determine best practices for resolving
these issues; and 3) develop a workflow to manage collections with
similar issues. A small sample from a contemporary, but relatively
high profile collection would be suitable for this purpose since the
subjects involved were already rights-sawy performers. It was also
decided that to the best of its ability, the Library would seek
permission from these rights holders to make these materials freely
available to the public without cumbersome access controls.
The Literature
There is a plethora of documentation on Digital Rights Management
(DRM) and DRM products available to leverage and control access to
digital assets for commercial gain and/or Intellectual Property
management (e.g., ARALOC, 2010, Stream Media, 2010 and Discretix,
2010.) Everything from watermarking to anti-cloning technologies is
available to "lock-down" content. Likewise there are as many Digital
Library resource pages (e.g., Berkeley Digital Library, 2007) detailing
these technologies and the consequent policy implications for
managing copyrighted and licensed media in a library setting.
Furthermore, there is a very lively scholarly discourse in the
preservation community around the proper expression (language) of
digital rights and the sharing of digital rights information (see Premis,
2010). All of which is exceedingly helpful once the data have been
obtained.
There are many practices employed when engaged in digital
activities such as national standards for reformatting, file naming
conventions, persistence, etc (e.g., Washington State Library, n.d., and
North Carolina ECHO, n.d.). However, the process and procedures by
which one should manage a Digital Rights project in a library -
especially a public library - setting are not thick on the ground. In fact,
it is for this reason that New York Public chose to work on these
collections - viz. this is relatively new ground.
We chose to digitize a selection of audio recordings of public
programs, including lectures, interviews, and panel discussions that
128 Managing Rights in a Medium Scale Audio Digitization Project
took place at the Humanities and Social Sciences Library between
1983 and 2003. The recordings, holdings from the Public Education
Program (PEP) represent some of the best of the thousands of public
programs offered for a small fee to all comers by the Humanities and
Social Sciences Library during those decades. Many of these events
were held in a grandly restored space called the Celeste Bartos Forum,
a venue that has, as a result, become familiar and beloved to
generations of New York City culture-seekers.
The programs chosen for digitization included talks and
interviews featuring world-renowned writers, thinkers, and public
intellectuals such as Chinua Achebe, A.S. Byatt, Umberto Eco,
Christopher Hitchens, Robert Thurman, Oliver Sacks, Esmerelda
Santiago, Sarah Vowell, and Wendy Wasserstein, among many others.
Permission to record the programs was sought from each subject prior
to the event. They were originally recorded live, in analog formats
including reel-to-reel and cassette tapes.
When choosing these particular programs, preference was given
to those subjects whose dynamic performances were the most in
keeping with the goals of the digitization project, and for which
permission was anticipated to be negotiated and obtained without too
much difficulty or searching. For these reasons as well, single subjects
(interviewees) were preferred
Rights Issues with Recorded Programs
For the most part flat art works (as plates or reproductions or original
art works) are associated with a wide but manageable universe of
rights holders. Even maps which have more "creators" than most print
materials - engravers, cartographers, drafters, navigators, colorists,
etc. - are fixed. The rights pertain to certain aspects of the creation of
the object but do not extend to the subject of the work such as "the
State of California" or to all parties that participated in the making of
the work such as the oiler who serviced the printing press.
The digitally reformatting of the audiotapes was unexceptional.
The tapes files were sent to a reputable vendor and quality control was
performed on the files when they were returned to the Library.
Managing Rights in a Medium Scale Audio Digitization Project 129
Technical metadata was recorded and deposited into a database
designed for keeping preservation technical data. The Library's
metadata specialists using the in-house utility created descriptive
metadata. However, obtaining permission from the speakers on these
recordings proved to be an unexpected and ongoing challenge.
The first step in rights process is to determine who is the rights
holder. This involved contacting the subjects and in some cases
editors, managers and lawyers, to match the audio tapes to their rights
holders. While the subject was clearly known, the rights
documentation needed to be created (with the assistance of the NYPL
General Counsel's Office) and managed by NYPL staff, in addition to
the management of the media asset. In this case, the Library
deliberately chose items where the rights were clearly held by a known
body, preferably one with a cordial, ongoing relationship to or affinity
for the mission and work of the NYPL. Nonetheless seeking and
obtaining permission even when the rights holder is undisputed is
often tedious and frustrating. In fact, a good deal of time was spent
finding and attempting to contact the subject in question. Once
contacted, there were many issues to resolve.
Rights holders were often difficult to reach. Sometimes this was
due to the age of the information (addresses are out of date, names
have changed, etc) or the amount of time that had elapsed since the
creation of the content. Sometimes this was due to privacy needs or
the sheer number of intermediaries (such as editors, agents, and
business managers) that had to be included in the negotiation process.
Sometimes there was no one left to contact. The rights may have
reverted the estate in the case of an individual's death or passed to
another individual without a clear record. In these cases it was nearly
impossible to determine the rights.
In a few cases permission was obtained when considerable time
was invested. Often rights holders were unfamiliar with the digital
environment and needed to be educated by Library staff about the
nature and purpose of digitization. This was a slow process and took a
good deal of time and patience and to explain everything in adequate
detail. It also required a finesse to describe technical processes in a
130 Managing Rights in a Medium Scale Audio Digitization Project
way that was meaningful to non-technical individuals. Sometimes this
was successful, sometimes not.
Certain more commercially successful individuals, whose fame
had increased in the ensuing years, retained counsel that aggressively
challenged the Library's request - sometimes with and sometimes
without cause (e.g., An individual may not want to relinquish rights to
an unedited version of a performance to avoid the possibility of
incurring liability for comments made during the performance.)
These individuals seemed to be motivated by one of two factors -
control over the distribution of creative content produced at an earlier
stage in the performer's career, and/or the possibility of generating
revenue from the digitized program.
Certain rights holders were unable to remember the contents of
the programs they had recorded. Others were uncomfortable with the
idea that their material would be available for free streaming
broadcast and download via iTunes University. Despite iTunes'
seemingly universal brand, the prospect of an NYPL partnership with
a commercial vendor confused and possibly alienated some of the
performers (or, as was more often the case, their legal counsel).
One strategic decision made by Library staff was to commission
written transcripts of the audio recordings of the events that we
wished to digitize, so that we might make the scripts of the events
available to anxious authors, publishers and literary agents, many of
whom suggested that they could not remember the content of a talk
delivered so many years before, and were therefore reluctant to give
their approval in advance of reviewing it.
The issues presented as follows:
• Rights holder in dispute
• Rights holder incommunicato
• Rights holder is the estate of a deceased person
• Rights holder unfamiliar with technology
• Rights holder represented by aggressive counsel
• Rights holder unable to remember contents of discussion
Managing Rights in a Medium Scale Audio Digitization Project 131
• Rights holder uncomfortable with venue
For the most part subjects who participated in these public
programs were initially hesitant to grant permission to digitize and
republish the program on the web, which for many of them
constituted a new and unusual forum for the distribution of their
work. Interestingly, younger performers gave permission with
considerably more alacrity and ease than did older performers (or
their counsel). As a result of these unexpected challenges, only a
limited number of the digitized public programs have been made
available to the public. However, the process of negotiating for rights
and gathering permissions is still ongoing, and over time many more
subjects have been persuaded to participate in the project.
The digitized materials are preserved for posterity and the
existence of this trove of live audio material featuring celebrated
authors and thinkers (all of whom have book holdings at the Library,
and some of whom have deposited their personal papers or
manuscripts, as well) is a source of great interest and enthusiasm for
Library staff and the users who have learned about them, thus far.
Lessons Learned
Few of the digitized tapes have been made available for public access.
However, the project met two of the three key goals: the issues
involved in rights encumbered audio were discovered and a workflow
for handling similar issues was established. The third goal -
establishing best practices for obtaining permission - can be derived
from the case studies in the project.
Many of the subjects were initially reluctant to grant permission
for very specific reasons. It leads to the conclusion that addressing
these specific concerns can lead to more successful negotiations. By
educating the rights holders (and sometimes the counsel) about the
new medium of digital audio, many of their material concerns were
allayed, or eliminated. By providing written transcriptions as memory
aids and evidence to be inspected by counsel, some subjects were able
to make a positive, informed decision regarding the Library's request.
By being sensitive to privacy issues, ambassadorial about digital
132 Managing Rights in a Medium Scale Audio Digitization Project
content, and above all else, being politely persistent, the Library has
been increasingly successful.
New Practices Born From 24 Hours at NYPL
The entire process of the project is dependent on documentation.
Without the proper releases in place these projects are not possible. If
releases do not exist then they must be obtained and rights must be
clearly spelled out. Furthermore, all records should be managed in a
centralized repository so that future uses for these items are possible.
The library was able to identify individual rights holders by sheer
tenacity and willingness to put in the hours required. The procedure is
prolonged and often takes many turns. Clear guidelines and processes
are needed in order to track progress and to properly manage the
relations established during the negotiations. Without proper record
keeping the process breaks down and accuracy suffers. Because of
excellent record keeping the Library was able to pursue this project
and obtain as many permissions as it has.
The challenges encountered in obtaining rights for 24 Hours at
NYPL highlighted the need for the Library to develop a more robust
rights digital management system. While releases had been obtained
for the original recordings these releases were paper documents kept
in separate files. There was no database to manage these documents
and or the relationship the documents had to the assets. All this work
had to be done before the work of seeking permission could begin.
Furthermore, evaluating the releases to determine the extent of the
Library's rights was necessary before any approach could be made.
These preparatory tasks are often considered incidental to digital
projects since digital rights are often discussed as a separate case or
special case. However, 24 Hours at NYPL underscored the need to tie
digital rights management to general collection management. In this
case, the relationship was drawn between the previously obtained
releases and the audio assets. In the case of the earlier project Digital
Gallery the issue was obscured because of the decision to include only
items that were either in the public domain or obtained for the specific
purpose of being included in the gallery. Nonetheless, the accurate
Managing Rights in a Medium Scale Audio Digitization Project 133
documentation in the bibliographic record was the basis for the
management of the digital rights.
This understanding has led the Library to create a robust Digital
Rights Application that will manage scanned paper documents such as
Deeds of Gift as well as rights information. The data will be managed
centrally and tied to both the bibliographic databases and the Fedora
Digital Repository. Rights data that is stored in a structured format
can be shared with multiple systems and exported for administrative
purposes and exposed for OAI harvesting.
Additionally work is being done to educate staff to consider
digital rights information as vital as provenance to the acquisition
process; and to include these data in any supporting documentation.
New guidelines for these managing theses discussions are currently
being drafted. The focus is to gather as much relevant information as
possible at the time collections are acquired. Protocols such as those
listed below are among the suggested practices:
• Identifying all known individuals contained in items in the
collection
• Identifying all known authors and/or creators
• Identifying all known third party rights holders
• Identifying permissions and/or releases from third party rights
holders
• Identifying documentation related to rights such as releases,
contracts, etc.
To the best of its ability the Library is also aiming to secure rights
as early as possible in the life of a digital object - or in some cases, the
life of a physical object since this informs almost all future decisions.
This is a change in practice that is being implemented slowly but
successfully in the Research Libraries.
There are many ripple effects to these new practices. In the future
the decision to acquire or not acquire a collection may rest of the
status of digital rights. Furthermore, the collection development of the
entire Library may shift because of thee factors. Certainly, the role of
the collection development staff will change.
134 Managing Rights in a Medium Scale Audio Digitization Project
Summary
The Library's role as a content creator, provider and distributor in the
age of podcasts and webinars, wikis, blogs and Twitter is becoming
critical. Access to new content areas is challenging especially when
intellectual property rights are involved. However, reluctance by
rights owners can be overcome by providing written transcriptions as
memory aids to help make informed choices. Persistence is key. Often
a donor's point of view can be swayed by gentle persuasion and
expressed sensitivity to privacy issues.
Obtaining permissions and releases is essential for the success. If
not already in hand, the process to obtain these allowances is slow and
difficult and can impede the completion of the project.
The lion's share of the work with digital collections with rights
that obtain to a party that is not the library is the proper recording and
management of that rights information. All parties involved with these
collections, including curators, directors, and counsel must share an
understanding that rights information is not incidental, but essential
to properly manage these collections.
Infrastructure - both technical and organizational - that supports
the recording and management of detailed rights data should be
implemented in libraries that wish to embark on similar projects.
Obtaining these data as early on in the acquisition process as possible
is optimal since researching these data is labor intensive and often
precludes successful completion of these projects. Staff at all levels
should be informed of these best practices and educated in the proper
procedures for acquiring and documenting rights.
References
ARALOC. (2010). Cross platform DRM. Retrived on May 2, 2010
from http:/ / araloc.com/
Berkeley Digital Library. (2007). Copyright, intellectual property
rights, and licensing issues. Retrived on May 2, 2010 from
http:/ / sunsite.berkeley.edu/ Copyright/
Managing Rights in a Medium Scale Audio Digitization Project 135
Discretix. (2010). Multi-Scheme DRM client. Retrived on May 2, 2010
from http://www.discretix.com/DRM/index.html?source=
adwordsCusGdrm-phrase888manufacturersioo2i5&gclid=
CKqHsMKX3KACFdlw5QodcyfZBQ
New York Public Library Digital Gallery, (n.d.). Retrieved on May 2,
2010 from http://digitalgallery.nypl.org.
North Carolina ECHO. (n.d.). Digitization guidelines. Retrieved on
May 2, 2010 from http://www.ncecho.org/dig/
guide_iplanning. shtml
Premis. (2010). Preservation metadata maintenance activity.
Retreived on May 8, 2010 from
http : / / www.loc.gov/ standards / premis/
Stream Media. (2010). Streaming Media Hosting is your expert for
DRM - Digital Rights Management solutions. Retrieved on May 2,
2010 from http://www.streamingmediahosting.com/drm.htm?
gclid=CMLolpWX3KACFWV75QodEAyXDA
Washington State Library, (n.d.). Digital best practice. Retrieved on
May 2, 2010 from http://digitalwa.statelib.wa.gov/newsite/
proj ectmgmt / vendors .htm
Webpage Dodson, Howard African American Women Writers of the
lgth Century. (1998). Retrieved on May 2, 2010 from
http://digital.nypl.org/schomburg/writers_aa19/intro.html
The In-House Digital Laboratory:
Possibilities and Responsibilities
Andrea Buchner (Center for Jewish History)
Abstract
The Gruss Lipper Digital Laboratory at the Center for Jewish History
has been building sustainable digital collections since 2005. It has
worked on more than 100 different projects and digitized a wide
variety of materials. This chapter highlights projects that include the
digitization of books, photographs, and archival collections. It covers
digitization from start to finish, touching upon the selection of
materials for digitization, digitization practices and workflows, the
management of digital assets, online delivery and users' reactions to
digital collections. It concludes with a discussion of the feasibility of
an in-house digital laboratory for archives and libraries in general.
Keywords: Digital asset management, Digital imaging, Digital
laboratory, Digital preservation, Digitization, Digitization projects
management, Digitization standards, Metadata, Sustainable digital
collections.
Introduction
The Gruss Lipper Digital Laboratory at the Center for Jewish History
was established in 2005 with the generous support of the Gruss
Lipper Family Foundation. Today, the lab is at the center of a digital
program whose mission is to preserve the digital assets of the Center
and its five partner organizations and to ensure long-term access to
them. Since 2005, the digital lab has been involved in more than 100
136
The In-House Digital Laboratory 137
digitization projects and has created more than 68,000 high-quality
digital images. It has also created 300 hours of high-quality digital
audio files by digitizing oral histories and Sephardic music stored on
cassette tapes and reel-to-reel tapes.
The digital lab manages all digital assets via ExLibris' DigiTool, a
digital asset management system for libraries and archives
(http://www.exlibrisgroup.com). The public interface is CJH Digital
Collections (digital.cjh.org). The digital collections are an integral part
of the Center's Online Public Access Catalog (catalog.cjh.org). This
catalog provides access to the Center's and partners' archival
collections and library holdings. If a digital reproduction exists in CJH
Digital Collections, the respective catalog record links to them. As of
December 2009, 21,000 digital objects were accessible. In 2009 alone,
174,000 users requested digital objects. All materials are accessible to
the public free of charge and the collections are constantly growing.
This chapter will examine a variety of projects. They illustrate
approaches to the digitization of different types of materials: books,
photographs, and archival collections. This chapter will also discuss
digital asset management, online delivery, users' reactions to digital
collections, and the feasibility of an in-house digital laboratory.
The Grass Lipper Digital Laboratory
The Gruss Lipper Digital Laboratory (digital lab) is a department
within the Center for Jewish History.
The Center for Jewish History is the home of five prominent
Jewish institutions dedicated to history, culture, and art. They are the
American Jewish Historical Society (AJHS), the American Sephardi
Federation (ASF), the Leo Baeck Institute (LBI), the Yeshiva
University Museum (YUM), and the YIVO Institute for Jewish
Research (YIVO). The combined holdings of the partner organizations
bring together the rich variety of Jewish historical experience, ranging
from Eastern European Jewry, Sephardic Jewry, German-speaking
Jewry, to the American Jewish community.
The collections include more than 500,000 volumes, 25,500
linear feet of archival documents, and thousands of museum objects.
138 The In-House Digital Laboratory
Since its opening in October 2000, over 40,000 researchers have
visited the Center and roughly 210,000 patrons have attended various
public exhibitions, conferences, and other events that the Center and
partners held.
The Gruss Lipper Digital Laboratory, a department of the Center,
collaborates with all five partner organizations in numerous ways by
digitizing their collections, managing the resulting digital assets, and
providing digital consultancy services. During the first two years of the
lab's operation it was fully funded through the grant from the Gruss
Lipper Family Foundation. When this period ended in 2007, the
Center made the digital lab an integral part of its functions and
services and continued to offer the lab's digital collections building
services to the partner organizations. In order to defray some of the
costs, however, the digital lab charges for its services whenever
external funding becomes available to the partners. When this is not
the case, the lab provides pro bono digitization services.
A typical digitization project starts with an idea put forth either by
Center or partner staff. This is followed by discussions, examinations
of materials, and following established procedures (including the
completion of forms) to ensure that materials arrive in the lab that
have been thoroughly vetted in regard to their value, physical
condition, rights, and available metadata. Once the selected materials
have been prepared and arrive in the lab, the process of digitization
begins. Projects may involve one or multiple partner organizations.
Projects
When it comes to determining the value of materials, the lab works
closely with partner staff. Value lies in the exceptional coverage of a
topic, the uniqueness (scarcity) of materials, how well they represent
the unique strengths of a partner's collection, and their appeal to the
partner's and the Center's audience. Materials must also be in the
public domain or be orphan works (i.e., copyrighted works whose
owner cannot be identified or located. See Orphan Works, 2009), or
the partner must have permission from copyright holders to make
The In-House Digital Laboratory 139
digital reproductions of them accessible online. Preservation needs
also play a role in selecting materials.
Among the materials digitized in the lab are 137 Yiddish and
Hebrew children's books from the Yeshiva University Museum (YUM)
and the YIVO Institute for Research (YIVO). The books document the
development of Jewish children's literature from the turn of the
century until the onset of the Holocaust in Eastern Europe. The
majority of the books are rare and only available at the Center. They
are in the public domain or orphan works and could therefore be
made accessible via CJH Digital Collections (digital.cjh.org). The
project began as a pilot project funded by the Metropolitan New York
Library Council (METRO) in 2007 and grew into a larger project when
a private family foundation decided to fund the digitization of
additional books. The Center also contributed the digital images to the
International Children's Digital Library Foundation which has made
them accessible through its website.
In addition to children's books, the digital lab has digitized rare
books from the Leo Baeck Institute (LBI). The digital lab has been
digitizing rare books since its inception. Funding has come from a
variety of sources; originally from the Gruss Lipper Family
Foundation, then from METRO and now through a private donor.
Most recently, the lab digitized manuscripts from the i5-i6 th century
pertaining to a famous Renaissance controversy between the Christian
Hebraist Johannes Reuchlin and the anti-Jewish agitator Johannes
Pfefferkorn, who advocated for the destruction of all Jewish books.
The digital lab is ideally suited for the digitization of rare books. It is
an in-house lab which reduces the risk of loss and damage that can
occur during the shipping of books to external vendors. Moreover, the
digitization staff is given the time to treat each book carefully and
according to its often fragile or tightly-bound condition. This ensures
that they are not damaged during the digitization process.
As for photographs, the digital lab has worked on dozens of
projects during its first two years of operation, having digitized
thousands of photographs (prints and negatives) as well as slides.
140 The In-House Digital Laboratory
Among the highlights are 3,200 photographs of Moroccan and
Turkish synagogues from the American Sephardi Federation (ASF).
The digital lab has digitized both portions of archival collections
as well as complete collections. From the LBI, it has digitized rare
oversize materials. They were selected due to their rarity and fragility.
Among the highlights are letters of protection for Jewish communities
in Germany from the 18 th century. Letters of protection were issued to
Jews by local authorities in Europe. These letters gave Jews the right
to reside in a town in exchange for a special tax.
The digital lab has digitized a few archival collections in their
entirety. The largest collection was from YUM. It consisted of 30
linear feet of materials, mostly fashion drawings by the New York City
fashion designer Abe Grubere. However, this collection is an exception
and archival collections digitized in the lab are usually small, such as
the Raphael Lemkin Collection from the American Jewish Historical
Society (AJHS) that consists of 7.5 linear feet. Raphael Lemkin was a
scholar who coined the term "Genocide." He was instrumental in the
United Nations' adoption of the Convention on the Prevention and
Punishment of the Crime of Genocide in 1948. The collection
documents Lemkin's lifelong effort to prevent genocide and lobby the
United Nations to adopt an anti-genocide convention. In total, the lab
digitized the content of 76 folders which translated into 4033 images.
The materials were digitized as part of an initiative at the Center that
culminated in a conference and an exhibition on genocide in 2009.
Copyright and Fair Use
When materials are selected for digitization the copyright status must
be clear. As for the YUM and YIVO children's books and LBI rare
books, the books were either in the public domain or orphan works.
They could be made accessible via CJH Digital Collections without
restrictions.
With regard to the photographs of Turkish and Moroccan
synagogues, however, the situation was more complicated. There was
a subset of 500 photographs of the 3,000 total, taken in 1989, where
the ASF and the photographer jointly held the copyright. The
The In-House Digital Laboratory 141
photographer was then consulted for his permission to digitize the
photographs and to make the digital reproductions accessible online
via CJH Digital Collections. His permission was obtained by means of
a license agreement.
As for the Raphael Lemkin Collection, the AJHS had permission
from Lemkin's heir to make digital reproductions of the collection
accessible. Among the selected materials was much correspondence
from third parties. They were letters from various activists to people,
committees or organizations that had the power to influence the
United Nations or various governments to adopt and/or ratify the
convention. The letters were protected by copyright. AJHS and the
Center started to contact copyright holders and permission was
granted whenever possible. We also reviewed the four factors of "fair
use" and concluded that the factual nature of the correspondence and
our intended use allow for making the letters accessible to the public
for "fair use" (see Fair Use, 2009). Each digital folder containing
letters was associated with a "click through" copyright notice,
outlining the allowed terms of "fair use" such as that no use other than
research, teaching, and private study is allowed without prior
permission from the copyright owner. Copyright owners not properly
identified, or any user with information on the copyright status of a
particular work, are asked to contact the Center and AJHS so that
appropriate information can be provided in the future. Every user
desiring access to the digital materials must agree to these terms by
clicking on a button before gaining access to them. The fair use
information can also be found in the descriptive metadata record that
is attached to each letter and at the bottom of the CJH Digital
Collections website.
Image Capture
The Gruss Lipper Digital Laboratory has the following equipment: A
BetterLight Super 8K-HS digital scanning camera back and a Mamiya
645ZD 22 megapixel (5,336 x 4,008 pixels) medium format single-
shot digital camera. The scanning workstation consists of a motorized
AIAXact 3040 DV copystand, four Buhl HID 150W 4k softcube lights
142 The In-House Digital Laboratory
as well as two book cradles. The lab also has an Epson Expression
10000 XL Photo flatbed scanner with transparency adapter and a
Minolta DiMage Scan Multi PRO AF-5000 multi-format film scanner.
The digital lab also has a TEAC 860-R cassette deck, an Otari Mx-
5050bII reel-to-reel tape machine, and a Behringer UB502 mixer.
This equipment allows for the digitization of the different types of
materials found in archives: Bound books, unbound materials,
oversize materials, photographs, film, slides, transparencies, cassette
tapes and reel-to-reel tapes.
Figure GLDL-1: Gloria Machnowski digitizes a decree from the Leo Baeck Institute
with the BetterLight camera. © 2009. Gruss Upper Digital Laboratory. Used with
permission.
When the digital lab receives a collection to be digitized, it tracks
the collection, item, and images eventually to be created from the item
in a Microsoft Access "tracking" database. For the actual digitization,
the digital lab follows standards established by the National Archives
The In-House Digital Laboratory 143
(Technical Guidelines for Digitizing Archival Materials for Electronic
Access: Creation of Production Master Files - Raster Images, 2004)
which include guidelines for the creation of high-quality archival
master images, quality assurance and accompanying metadata. As
these guidelines recommend, each image contains a color scale and
gray scale as color and tone references and a ruler as a reference for
the size of the original document. Depending on the original, the
digital lab produces 400 ppi 8-bit grayscale, 400 ppi 24-bit color, or
600 ppi l-bit bitonal TIFF images with LZW or ITU-T compression. In
terms of resolution, photographic prints and film pose an exception. It
is determined by the size of the original and a minimum number of
pixels needed to produce a high-quality image. 8-bit grayscale images
are tagged with a Gray Gamma 2.2 profile, 24-bit color images with an
Adobe RGB (1998) ICC color profile.
Recommended technical and administrative metadata that is not
embedded automatically during the image capture process in the TIFF
image is added to fields in the TIFF header via Adobe's Photoshop
software, such as the name and title of digitization staff, scanning
facility (Gruss Lipper Digital Laboratory), organization (Center for
Jewish History), and contact information.
The TIFF images are not post-processed. The goal is to create a
digital archival master file that reproduces the original as closely as
possible. During the quality assurance process, the TIFF images are
inspected for orientation, sharpness, color, contrast, noise, and dust.
Images failing this inspection (images that are skewed, out of focus,
over- or underexposed, show digital noise, dirt, or dust) have to be
rescanned. Access copies for web delivery are created either in the
JPEG2000 or PDF format. JPEG2000 is a format that has an efficient
compression rate allowing for the online delivery of high-resolution
images into which users can zoom to appreciate fine details. This
format works well with illustrated children's books, rare books and
oversize materials as well as photographs. At the maximum zoom
level, the full resolution and detail of the JPEG2000 image can be
appreciated.
144 The In-House Digital Laboratory
For collections consisting of mostly type-written materials, such
as the Raphael Lemkin Collection, the delivery format is PDF. The
files are small and load quickly and the full text of each type-written
document is searchable thereby allowing users to quickly search the
content of a digitized folder by keyword.
Metadata
The Gruss Lipper Digital Laboratory records descriptive,
administrative, and structural metadata via a variety of community
standards. It has adopted MARC 21 (MARC Standard, 2010) as the
standard defining the structure of a catalog record and AACR2
(Anglo-American Cataloging Rules, 2005) as a content standard. The
standards are justified since MARC records already exist in the
Center's library catalog for many of the items digitized in the lab. They
can easily be imported into the Center's digital asset management
system using the Z39.50 protocol. The staff of the Center and Center's
partners consists of trained librarians who update records when
needed.
If records are created from scratch and do not yet exist in the
Center's OPAC, the digital lab recommends DACS as a content
standard. DACS (see SAA: Describing Archives: A Content Standard,
n.d.) was not conceived with the space constraints of a library catalog
card in mind and therefore eliminates many of the abbreviations
present in AACR2, making it a more user-friendly standard in an
online environment.
The lab has used MODS (Metadata Object Description Schema.
2010) for selected projects, but has not yet officially adopted that
standard. The lab has developed cataloging guidelines for a MARC 21
digital collections minimum record and a MARC 21 digital collections
core record.
The MARC digital collections minimum and core record are based
on the idea of levels of description as put forth in DACS, with the
minimum and core record roughly corresponding to DACS' single-
level minimum and single-level optimum/added value descriptions.
Particularly important is the use of controlled vocabulary. Controlled
The In-House Digital Laboratory 145
vocabulary allows for the building of browsing categories in the
Center's digital collections portal, CJH Digital Collections
(digital.cjh.org). These virtual collections allow users to search for
relevant materials across collections and partners. Controlled
vocabulary also ensures that a user retrieves all relevant materials
when searching by keyword. Subject headings are taken from the
Library of Congress Subject Headings (LCSH), Thesauri used are the
Art and Architecture Thesaurus (AAT), and the Thesaurus of Graphic
Materials (TGM).
The virtual collections in CJH Digital Collections are based on
type (the lab uses genre terms to build the "collections by type"
browsing hierarchy, such as "children's books," "drawings,"
"photographs," etc.), repository (the Center and its five partners), and
provenance (title of a collection). Most recent research recommends
organization by subject (Schaffner, 2009). The lab has begun this type
of organization for a selected set of photographs from YIVO (subjects
are Holocaust, political life, Yiddish theatre and others).
A "collection by type" to be highlighted is "archival finding aids."
CJH Digital Collections hosts 8,200 EAD finding aids, many of them
converted from MARC as part of a recently completed project. Our
goal is that eventually all finding aids will link to digital objects, if the
archival collection has been digitized. These finding aids constitute an
additional access point to the digital objects.
Finally, the digital lab also uses OAI-PMH, the Open Archives
Initiative Protocol for Metadata Harvesting. The use of this protocol
allows for the harvesting of all MARC records in CJH Digital
Collections by OAIster, a union catalog of more than 23 million
records from more than 1,100 contributors. OAIster records are freely
available through WorldCat.org.
Creation of MARC Records
When MARC records already exist in the Center's OPAC, as it is the
case for all rare books from the LBI, they are imported into the lab's
digital asset management system and adjusted in terms of genre terms
146 The In-House Digital Laboratory
and provenance to ensure they appear within the appropriate
browsing category in CJH Digital Collections.
For some of the children's books from YUM, MARC records did
not exist in the Center's library catalog. YUM, as the only museum
among the Center's partners, has its own database that is separate
from the Center's library catalog since museum objects follow
different rules for description. For these books, a children's book
MARC record template was created in DigiTool which was used by the
lab's metadata librarian who cataloged the children's books based on
the information available in the museum database. For the ASF
photographs that were digitized by the digital lab, just as with the
YUM children's books, no MARC records existed in the Center's
OPAC. In general, not much information was available for them,
which is not a surprise since photographs that are part of archival
collections are rarely cataloged on the item level. The little
information partners usually have about photographs at the point of
digitization, call number, dimensions, rights and repository that owns
them, is generally transferred to the lab's tracking database. Once
digitization is completed, the data is exported from the database and
converted into MARCXML and together with the images, ingested into
the Center's digital asset management system. These MARC records
with minimal description are designed to be enhanced, based on the
lab's cataloging guidelines by partner catalogers once they have
obtained the missing information, such as title, date, and other added
entries.
In addition to descriptive metadata, the digital lab also creates
structural metadata. This type of data is necessary for complex objects
such as books, archival folders, and archival documents that consist of
more than one page. Structural metadata preserves the physical
and/or logical structure of such complex/multi-page objects in a
digital environment. It allows users to page through a faithful
reproduction of the original online. The digital lab uses METS (see
METS Metadata encoding and transmission standard, 2010) as the
standard for structural metadata.
The In-House Digital Laboratory 147
The structure or "structural map" as it is known in METS is
determined by the physical and/or logical structure of a physical item.
Re-creating structures for books is less complicated and time-
consuming since pages are usually numbered and chapters exist.
Much more time-consuming is the recreation of the structure of an
archival folder. Description usually only exists on the folder level, but
a structural map for a folder reaches down to the item level. How
detailed the structural map will be depends on how much description
is necessary to make the content intelligible to users.
University archivists reported during the More Product, Less
Pixels session at the SAA Annual Meeting in 2008 that students were
frustrated when working with digital archival collections with no item
level description. In comparison to a physical folder which can be
opened up quickly and paged through, students felt that it was more
difficult to determine if the content of a folder was useful to them or
not. They complained about an "online microfilm experience." (see
Session 49: More Product, Less Pixels: Alternate Approaches to
Digitization and Metadata, 2008. )
Due to the relatively small size of the Raphael Lemkin Collection,
the digital lab decided to facilitate users' experience and to describe
the content of each folder not only on the folder level but also the item
level. Folder information came from the finding aid but the item level
information was supplied by the digital lab's metadata librarian. The
results are extremely detailed structural maps. For example, the
structural map of a digital folder containing correspondence consists
of labels for each letter containing the name of the correspondent and
date of creation, and all letters are organized by date.
Moreover, additional information that the lab's metadata
librarian gleaned from studying the letters and particularly Lemkin's
index cards and research notes was also incorporated. This was done
by adding subject headings and added entries to the MARC record
that is attached to each digital folder. Finally, AJHS personnel
updated the finding aid with links to the digital folders.
148 The In-House Digital Laboratory
File Management and Online Display
The digital lab manages all digital assets via the digital asset
management system DigiTool, which has been designed for digital
asset management in libraries and archives. It supports a variety of
community standards and file formats including audio and video,
supports complex objects, assigns unique and persistent identifiers to
each digital image and metadata, supports basic preservation tasks
such as tracking of changes to objects and metadata, saves this data as
well as all other metadata in xml, and allows for digital asset
management en masse, all of which reduces the risk of loss or digital
obsolescence. The physical infrastructure of the digital repository is
provided by a StorageTek FlexLine 210R storage unit. This storage
unit currently contains 10 TB of RAID-5 storage, and a tape library
system for daily backups to tape. Backup tapes are stored off-site.
When it comes to storage and backup, the digital lab works very
closely with the Center's IT department.
DigiTool, while primarily a digital asset management system, also
powers the Center's digital collections portal, CJH Digital Collections.
A variety of viewers allow for the display of all the diverse digital
assets described in this chapter, such as simple objects like
photographs as well as complex objects like books and archival
folders. However, while the viewers are relatively sophisticated
(offering zooming and rotation features, browsing by a thumbnail
gallery or a table of contents created through METS), the interface
cannot be customized beyond color schemes and fonts. External
applications can be embedded, such as an Adobe Flash player, and
whenever this is done, the Center's web designer is of great assistance.
However, one has to keep in mind that DigiTool was primarily
designed to manage digital assets and not to create exciting web
exhibitions.
User Experience
The digital lab has gained much experience in the creation and
management of digital assets. Over the last four years, it has built
sustainable digital collections based on its mission to preserve them
The In-House Digital Laboratory 149
and to ensure their accessibility for the long term. However, creation
and management of sustainable digital assets is only one side of
digitization, the other is users' reactions to digital collections. Since
early 2009, DigiTool generates reliable usage statistics which the lab
started to analyze. At the same time, the lab also joined the photo-
sharing website Flickr (http://www.flickr.com/photos/center_for_
jewish_history) and made user surveys available in CJH Digital
Collections to obtain feedback.
Usage statistics for CJH Digital Collections reveal that on average
in 2009, the site was visited by 14,500 users a month (ca. 470 people a
day). Of course, certain collections are requested more often than
others. Among the materials discussed in this chapter, the children's
book collection ranks the highest in terms of requests, having been
requested by users between 30-70 times between March - December
2009. The Raphael Lemkin Collection and LBI rare books rank
second, with 20-30 requests per folder or book during the same time
period, with one rare book, "Juedisches Ceremoniel" standing out,
having been requested more than 78 times. Photographs of Turkish
and Moroccan Synagogues are among digital objects least requested
(1-10 times during the same period). The discrepancies in requests can
be explained by the newness of a project and their availability by
several means of access. While the children's books and synagogue
photographs are projects that started three years ago, the children's
books project was completed only in 2009 when it was incorporated
into a bigger children's books digitization initiative. The children's
books are currently featured in a virtual collection dedicated to
recently completed projects in CJH Digital Collections. They are also
accessible through the website of the International Children's Digital
Library Foundation and 14 of them are available on Flickr, as is the
"Juedisches Ceremoniel."
The Raphael Lemkin materials are accessible online in their
entirety only since September 2009. However, the number of requests
is high due to the publicity they received in conjunction with the
November 2009 Genocide and Human Experience Conference at the
Center. In the course of the advertising campaign for this conference,
150 The In-House Digital Laboratory
selected materials were featured on the conference's home page and in
a dedicated web exhibition.
While the original purpose of joining Flickr was to gain feedback
from users (DigiTool does not allow for users to leave comments) it
became clear quickly that Flickr is also a great tool to promote the
Center and partners' digital collections. The 189 images on Flickr (as
of 11 December 2009) were viewed more than 12,300 times over the
course of eight months. This means that on average, an image on
Flickr has been requested 65 times. This is a much higher average
number of requests for images in comparison to CJH Digital
Collections. The lab was also contacted and asked if some images
could be featured on a personal blog. Moreover, people responded to
AJHS' requests to identify unknown people in photographs. Among
the people who offered their help was the great-granddaughter of one
of the people featured in a photograph. Users tagged images with the
occasional "that's a wonderful image" remark. There were no negative
remarks. Considering the feedback and number of requests per image,
the Center and participating partners deemed the project a success
and the uploading of images to Flickr and the administration of the
Center's Flickr site has become part of the digital lab's services.
The surveys created to obtain users' reactions to digital
collections in CJH Digital Collections indicate that users are generally
impressed with the high quality of the images and their delivery. They
show that users sometimes get frustrated, citing an inability to find
what they are interested in. Users also find it difficult to get a sense for
the amount of material online, their content, and how they relate to
one another.
From analyzing usage statistics, surveys, and reaching out to
users through Flickr we have learned that it will become increasingly
important to better promote digital collections, explain more clearly to
users what our digital collections are about and how they should be
used, and connect with users in new ways.
The In-House Digital Laboratory 151
Conclusion
The Gruss Lipper Digital Laboratory represents an impressive
investment on the part of the Center for Jewish History. During the
first two years of its operation, a grant paid for the complete
refurbishment of a room making it suitable for a digital lab, the
purchase of first-rate digitization equipment, and a staff of five people
(a director, metadata librarian, photographer, and two part-time
digital asset production associates). At this point, the Center supports
a staff consisting of two full-time and two half-time positions: one
director, one metadata librarian, and two photographers/digital asset
production associates. There are additional costs for necessary and
continuous upgrades and investment into software, hardware, and
data storage as digital collections continue to grow. The cost of
continued education for staff as digital technologies evolve also has to
be factored in. Finally, technical support that is lent to a digital lab by
an institution's IT department must be part of any lab's budget
considerations.
Because of these high costs, a state-of-the-art digital lab requires
sustained support from stakeholders. These stakeholders have to have
an understanding of the challenges that maintaining a digital program
brings and an understanding of the basic concepts behind digital
preservation. With this support and a professional and standards-
based program for digital image capture and digital management as
represented by an in-house digital laboratory, the groundwork is laid
for the building of a trusted digital repository (RLG/OCLC Working
Group on Digital Archive Attributes, 2002), the long-term goal of any
digital program.
For the investment it represents, an in-house digital lab like the
Gruss Lipper Digital Laboratory brings tremendous benefits. When
developing new digitization projects an institution can draw on
experienced staff. The digital assets remain under an institution's
control, they can be accessed at any time, and there are no access
restrictions because rights to the digital images have not been given to
a vendor. The high-quality digital images can be used for many
different purposes, ranging from web exhibitions to printing and
152 The In-House Digital Laboratory
publishing. The professional administration of a digital asset
management system ensures that digital assets are managed safely
and securely. No file is lost, even years after a project's completion.
Strict adherence to community standards ensures that files can be
migrated to different formats if the need arises and that data can be
exchanged with other systems. The result is digital collections whose
hallmarks are usability, portability, and longevity.
There is much more to digitization than just image capture. It
includes metadata, standards, digital asset management, online
delivery and users' experience, all of which require expert knowledge
for creating useful and sustainable digital collections. If long-term
preservation of digital assets is part of an institution's mission and if
that institution remains abreast of new digitization developments and
technologies, an in-house lab is well worth the investment.
References
Anglo-American Cataloging Rules, Second Edition, 2002 Revision,
2005 Update. (2005) Chicago, IL: American Library Association.
Fair use (2009). Retrieved December 11, 2009, from U.S. Copyright
Office website, http://www.copyright.gov/fls/fl102.html
MARC standards. (2010). Retrieved December 11, 2009, from the
Library of Congress - Network Development and MARC Standards
Office website http://www.loc.gov/marc/
METS. Metadata encoding and transmission standard (2010).
Retrieved December 11, 2009, from the official web site
http : / / www.loc.gov/ standards / mets /
MODS. Metadata object description schema. (2010). Retrieved
December 11, 2009, from the official web site http://www.loc.gov/
standards/mods/
Orphan works: Statement of best practices (June 2009). Retrieved
December 11, 2009, from Society of American Archivists' website
http : / / www. archivists, org/ standards /
RLG/OCLC Working Group on Digital Archive Attributes (2002).
Trusted digital repositories: Attributes and responsibilities. RLG,
The In-House Digital Laboratory 153
May 2002. Retrieved December 11, 2009 http://www.rlg.org/
en/page.php?Page_ID=583
SAA: Describing archives: A content standard (DACS) (n.d.).
Retrieved on May 1, 2010 from http://www.archivists.org/
governance/ standards/dacs.asp
Schaffher, Jennifer. (2009). Metadata is the interface. Better
description for better discovery of archives and special collections,
synthesized from user studies. Retrieved December 11, 2009, from
the OCLC website http://www.oclc.org/research/publications/
Session 49: More product, less pixels: Alternate approaches to
digitization and metadata. (2008). SAA Annual Meeting, August
26-30, 2008.
Technical guidelines for digitizing archival materials for electronic
access: Creation of production master files - raster images.
(2004). Retrieved December 11, 2009, from the National Archives
website http://www.archives.gov/preservation/technical/
guidelines.html
Building a Virtual Library - A Case Study
at The Library of The Jewish Theological
Seminary
Naomi M. Steinberger
(The Library of The Jewish Theological Seminary)
Abstract
The goal of this chapter is to show how a medium-size research library
with rich resources in special collections has succeeded in the past
decade, in building a digital library. The chapter focuses on building a
strategic plan for a digital library and assessing The Library's
collections for digital readiness. Planning the digital library includes
making many decisions ranging from software, prioritizing collections
to digitizing, metadata schemas, and more. Funding for digitization
can be grants, gifts in kind where the funder provides the institution
with a digital photographer, to funders who want a specific collection
digitized. There is a discussion of some of the challenges such as
hardware capability and staffing. The important lessons learned
through experience and plans for the future of the digital program are
discussed.
Keywords: Assessment, Building digital collections, Digital
collections, Digital library, Digital project funding, Medium-size
academic library, Metadata for digital collection, Strategic planning,
Virtual library.
154
A Case Study at The Library of The Jewish Theological Seminary 155
Background
The mission of the Jewish Theological Seminary, the intellectual and
religious center of Conservative Judaism, is twofold: to serve as the
pre-eminent center for the academic study of Judaism outside of
Israel, as one of the pre-eminent centers world-wide, and as a training
center to advance that study; and to educate Jewish professionals and
lay leaders in the spirit of Conservative Judaism for the total
community through academic and religious programs, both formal
and informal.
In accordance with the Seminary's overall goals, The Library's
mission is to collect, preserve and make available the literary and
cultural heritage of the Jewish people. The collection includes 25,000
rare books, 11,000 manuscripts, 400 archival collections, and many
other historically significant items. Among the particular strengths of
its collection are its 35,000 fragments from the Cairo Genizah
(representing the lives of Jews and others in the eastern
Mediterranean from the 11th to the 19th centuries), its collection of
ketubot (Jewish marriage contracts)— the largest in the world— and its
unparalleled collection of Passover haggadot (the traditional text of
the holiday, combining narrative and ritual). The Library is also home
to the world's largest collection of Hebrew incunabula (early books
printed before 1501).
The Special Collections are open to students and researchers from
around the country and the world. Scholars of Jewish history
specializing in virtually any period or place rely on the
Library's unique holdings. Scholars of American Jewry are
particularly reliant on The Library's archives. The Library is also the
center for scholarly dialogue within The Jewish Theological Seminary
(JTS) and the focal point for numerous public events devoted to
Jewish culture and books.
Currently, The Library employs 15 librarians, five subject
specialists and eight support staff. JTS has 40 faculty and 200 staff
serving 500 students. In fiscal year 2010, after budget reductions, The
Library's operating budget of $1.7 million plus approximately a half-
156 A Case Study at The Library of The Jewish Theological Seminary
million dollars in designated funds each year. Budget designated to
digital collections is lest than $12,000 per year.
In The Library's 2005 strategic plan, digitization is emphasized as
a method of preservation of the original item, because "more damage
is done to rare materials through human handling than by any other
means." "The more readers can gain access to images of rare materials
and forego handling the materials themselves," the document
continues, "the better they will be preserved." (Strategic Plan, 2005.)
Large external digitization projects such as The Google Book Project (Google Book,
2010) and the licensing of the Otzar Ha-Hochma collection of 19,000 fully-digitized
Hebrew Books and HebrewBooks.org are viewed as efforts to take care of the need
to digitize modern printed books, allowing the Jewish Theological Seminary library
to focus resources on only digitizing materials unique to the library's collection.
Digital Library Strategic Plan
After successfully completing a number of small digitization projects it
was clear that a strategic direction for The Library was necessary. In
November 2005, we participated in the "Persistence of Memory:
Stewardship of Digital Assets" conference presented by the Northeast
Document Conservation Center (NEDCC) and partially funded by the
Institute of Museum and Library Services (IMLS). At that conference,
Tom Clareson, from PALINET, was soliciting libraries to participate in
a study funded by the IMLS to assess institutions "digital readiness".
JTS volunteered and became one of the sample institutions for this
study.
Two members of the project team, Tom Clareson (Program
Director for New Initiatives, PALINET) and Richard Kim (then Digital
Projects Manager, Metropolitan New York Library Council), served as
the project evaluators. The assessment entailed completion of a
detailed survey that posed questions about the existing digitization
environment, plans, hardware, software, and digital preservation,
included was a site visit. During the site visit, discussions were held
with people responsible for selection of items for digitization, creating
metadata, scanning technicians, conservators, reference librarians
and administrators and the JTS IT department. The outcome of the
A Case Study at The Library of The Jewish Theological Seminary 157
completed survey and the meeting were the basis of a strategic plan
for The Library's digital program.
The key interests of the library administration and staff were to
determine the best way to manage the existing and future digital
assets; evaluate current status of access to digital materials; to develop
strategies for future digital projects and funding for those projects;
and to begin to look at the sustainability of digital projects within The
Library's digitization program.
The assessment report outlined the following recommended
(Clareson and Kim, 2006):
• Identification and selection of priority collections to be digitized,
including the development of a written "Digital Collection
Development Policy."
• Identification of the primary and secondary audiences for the
existing and future digital collections of JTS.
• Exploration of existing standards in content creation and
metadata.
• Consideration of upgrades to digitization equipment for future
projects, as well as exploration of working with digitization vendor
services.
• Adoption of a library-wide set of digital procedures.
• Working with a metadata consultant to make sure "retrospective"
metadata and the metadata for future projects allows researchers
to easily reach the JTS digital collections.
• Exploration of the utilization of a digital asset management system
to deliver JTS images to its constituents and to the world.
• Adoption of handling guidelines for materials to be digitized.
• Documentation of current digital preservation practices, and
working with a collaborative or third-party digital archiving
service.
Implementation of the recommendations began with a document
outlining potential unique collections to be digitized followed by
158 A Case Study at The Library of The Jewish Theological Seminary
guidelines for standards in content creation and metadata and
establishment of library-wide digitization procedures. Decisions were
made regarding the upgrade of existing metadata. A digital assets
management system was licensed and there were explorations
regarding digital preservation services.
JTS, as a small academic institution could not support an open
sources system, such as DSpace (About DSpace, n.d.) or Greenstone
(Greenstone Digital Library Software, n.d.). Our IT staff is too small
and turn over was too high in both the IT department and The
Library. We wanted to host our digital assets locally. We had a long-
standing relationship with ExLibris, as we run the ALEPH500 system
at The Library. A decision was made to license DigiTool (ExLibris
Digitool, n.d.), their digital assets management system.
One of the significant unique collections identified for digitization
were the audio-visual collections. These recordings, in multiple
obsolete formats, are completely inaccessible it is therefore imperative
for us to digitize. They include musical recording, lectures, films and
other unique and unidentified materials.
With recommendations from the digital assessment in hand The
Library applied, and was awarded, funding from the National
Endowment for the Humanities (NEH) to assess the audio-visual.
This assessment conducted by specialists in the field of sound
preservation, in combination with cross-training of staff with
responsibilities related to the sound collections, has provided the best
foundation for conserving primary sound materials and converting
them, when necessary, and using the best technology to more
contemporary media.
The Library created an inventory of the collection, conducted an
assessment of the current status of the collection with sound
consultants Chris Lacinak, President of Audiovisual Preservation
Solutions, and Kevin O'Neill. An assessment report was produced
and staff was trained to handle the audio-visual collections and
manage preservation of the collections. (Lacinak and O'Neill, 2008)
With the recommendations of Audiovisual Preservation Services,
The Library has moved forward to the next step as laid out in the
A Case Study at The Library of The Jewish Theological Seminary 159
assessment report. We are seeking funding to gain physical and
intellectual control of the sound collections. We have prioritized the
preservation of certain collections; and have prepared selected items
for reformatting. Once funding is secured we will be able to perform
necessary conservation, reformatting, provide for digital preservation
and make the recordings available on the web.
One of the most significant issues with which we need to grapple
is the recommendation of Audiovisual Preservation Services to
develop in-house capability for the reformatting of our sound
collections. This will require commitment from The Jewish
Theological Seminary to build capacity for reformatting on site in the
existing and underutilized sound studio. The institution would need to
repurpose the space and commit to allocating funding to acquire
equipment and staff the lab. The Report states that it would take eight
years to complete the reformatting of The Library's collection.
Subsequent to the funded assessment of the sound collection, The
Library received a large gift of a significant ethnomusicological
collection and submitted an application for the processing and
reformatting of these sound collections.
We received a grant from the Metropolitan New York Library
Council (METRO) to digitize a collection of 250 wedding poems,
unique poems composed for wedding parties, particularly in Italy
during the I7th-i9th centuries. One of our staff members was familiar
with DigiTool and he was successful in bringing up the system, with
the Wedding Poems on schedule in January 2008.
Setting guidelines and standards for the presentation of the
digital library generated debate amongst the staff. A standing
committee for creation of the Digital Collections met regularly to set
guidelines and monitor the development of the digital collections. The
committee consisted of the Director of Library Services,
Administrative Librarian for Technical Services, Administrative
Librarian for Special Collections and the Systems and Digitization
Librarian. Guidelines were set for a digital file naming convention; for
the quality of the digital image, and the web interface for the digital
library. Importantly, decisions were made as to the level of metadata,
160 A Case Study at The Library of The Jewish Theological Seminary
how the metadata would be presented (DigiTool presents metadata in
Dublin core), where the metadata should be created, if none existed,
and the extent to which existing metadata needed to be upgraded.
It was determined that The Library would only digitize materials
that were not under copyright. The exceptions to this are student
undergraduate senior theses and master's theses where we have
permission from the students to post their work.
Funding opportunities
Digital project comprise of a number of parts, namely, metadata
creation, digitization of the objects, and bringing the objects up as part
of the digital library. Each component is labor intensive, and therefore
costly. One of the major costs that cannot be carried as part of the
library's budget is the actual digitization of materials. As a result
external funding and partnerships are required for each special
project or collection. Funding has been obtained from various sources,
including public funding, private foundations, non-for-profit
organizations and individuals.
Two assessments have been completed and a third in process with
grants from public sources. The digital readiness assessment was
funded by an IMLS grant awarded to the NEDCC. The NEH funded
the assessment of the audio-visual collection. Currently we are
evaluating the effectiveness of the use of our digital library with funds
from METRO who receives the funding from New York State.
Individuals have donated funds to digitize specific collections
such as the bookplate collection, the Solomon Rosowsky Field
Recordings fro pre-state Israel (1936 and 1938) and the recordings of
Cantor Samuel Hallegua from Cochin, India (1967-1968). This was all
achieved through unsolicited funds. Each of the individuals either had
a connection to the collection (the Hallegua recordings) or were
interested in the subject areas (book plates and field recordings).
Funding has been received in exchange for "in kind" services. The
Library's entire collection of 35,000 fragments from the Cairo
Genizah, literally a hidden treasure. These documents are a
storehouse of documents found in a synagogue Cairo at the end of the
A Case Study at The Library of The Jewish Theological Seminary 161
19th century. They were digitized through an international project, the
Friedberg Genizah Project, where photographers were sent to The
Library to work on site, but were paid directly by the external project.
A private foundation had particular interest in digitizing specific
"treasures" from The Library's collection. This enabled us to digitize a
number of The Library's most valuable and important illuminated
Hebrew manuscripts including the Esslingen Mahzor (Ashkenaz, 13th
century), subsequently loaned to the Metropolitan Museum of Art
during 2008-2009 and the Rothschild Mahzor, (Florence, 1490).
Funding for specific projects enable us to select the materials and
scale the project accordingly. When we began digitization in 2000 we
selected materials that could be digitized on a flat bed scanner. The
National Foundation for Jewish Culture awarded funding to digitize
items related to Jewish American culture. With two grants we
digitized 350 newspaper clippings from the US press from the 18th
and 19th centuries and 100 unique pamphlets from the late 18th and
19th centuries. With funding from METRO in 2008 we digitized 250
wedding poems. With current funding from METRO we are digitizing
diaries, in 25 volumes, belonging to the great Jewish theologian
Mordecai Kaplan.
Building Digital Projects
The "NINCH Guide to Good Practice in the Digital Representation
and Management of Cultural Heritage Materials" (The NINCH Guide
to Good Practice in the Digital Representation and Management of
Cultural Heritage Materials, 2002) served as a guide for planning
specific projects. It emphasizes the many components necessary to
create a successful digital project. Over the past decade, The Library
has learned through experience to improve the methodologies,
workflows, image and digital file formats and minimum metadata
necessary to create a successful and sustainable for digital projects.
Each project has served as a building block for improvement.
A medium-size academic library cannot sustain a full service
digitization lab with cameras, scanners, audio conversion equipment,
that quickly become obsolete. The acquisition and maintenance of
162 A Case Study at The Library of The Jewish Theological Seminary
technical equipment fell outside our core mission. Our collection
includes extraordinarily rare materials that cannot leave the premises.
Items on parchment cannot leave the temperature and humidity
controlled environment of the rare book room and special reading
room for extended periods of time.
We did purchase a flat bed scanner, a digital camera with lights
and a book scanner. Most of this equipment is out of date but still
serves us well for in-house work.
Libraries are experienced in creating metadata for bibliographic
description of books and other materials. We need to apply those
methodologies to the digital object and learn how descriptions of
digital materials differ, or do not, from description of the physical
object. We have retooled and learned different metadata schemas
commonly used for digital collections such as Dublin Core (Dublin
Core Metadata Initiative, 2010) and MODS (Metadata Object
Description Schema, 2010).
Prioritizing Projects
Because of the rarity and research value of the collections, many
institutions approach JTS with invitations to participate in
collaborative digital projects. Some funding sources have approached
the institution about specific digital projects. While prioritizing which
collections should be digitized funding organizations and individuals
approach The Library with their projects and request our
participation.
The following questions need to be asked before partnering:
Would a project provide complimentary materials to JTS
holdings? What collaborations are possible? How important and
useful are the items to users? How does one assess usefulness?
The physical condition of the item, the ease and method of
digitization all play a role in prioritization. Do you digitize the
"popular" favorites of The Library or do you make little know, hidden
and inaccessible collections available? When approached by a funding
group with a specific project how do you evaluate the project in
relation to overall library priorities?
A Case Study at The Library of The Jewish Theological Seminary 163
The Library's goal is to present a broad range of digital materials
reflecting the breadth of its collection. To date we are presenting
books, manuscripts, musical scores, sound recordings, archival
finding aids, flat items, pamphlets, and little know and inaccessible
collections such as the wedding poems and bookplates.
Early Projects
The Library's early digitization projects were flat objects
(newspaper clippings), pamphlets which were saved as .pdf files
(American Judaica pamphlets from late 18th and 19th centuries) or
digitization from slides and transparencies (ketubbah— marriage
contract - collection).
Funding was secured for the digitization of the newspaper
clippings and the pamphlets and for the purchase of a flat bed
scanner. The files were all saved as .pdfs. Metadata existed in The
Library's ALEPH500 catalog. The .pdf files were loaded onto the
ALEPH server and links were made within the bibliographic record
utilizing the MARC 856 field.
Access to unique visual materials was a high priority. The
ketubbah (Jewish marriage contract) collection comprises of more
than 500 contracts, many of which are illuminated or illustrated. 35
mm slides or 4 x 6 transparencies were available for about 450 of the
contracts, all were cataloged. A slide and transparency adapter was
purchased for the flat bed scanner and they were easily scanned in
three sizes, thumbnail, medium and large. Each of the three files were
saved on the ALEPH server and then linked to the bibliographic
record. Users have access to the small, medium and large sized of each
of the marriage contracts.
Partnership Projects
Partnership projects give us the opportunity to have large
collections digitized with external funding and to utilize technology
that is not available in The Library. Crafting a partnership agreement
has been one of lessons learned over the past five years. One of the
important issues emerging from these partnership projects, is control
of the images. The partner may have a copy for use, generally on their
164 A Case Study at The Library of The Jewish Theological Seminary
site, but JTS must own the images. Quality of the images is also an
area of concern with partnership projects.
We have participated in three major partnership projects each
with successful outcomes but not without challenges.
Genizah fragments: A partnership agreement was signed with
The Friedberg Genizah Project (FGP) to digitize the 35,000 fragments
from the Cairo Genizah owned by JTS. These documents, discovered
at the end of the 19th century are "a collection of fragmentary Jewish
texts stored in the loft of the ancient Ben Ezra Synagogue in Cairo,
Egypt between the 8th and 19th centuries. These manuscripts outline
a 1,000-year continuum of Middle-Eastern history and comprise the
largest and most diverse collection of medieval manuscripts in the
world." The FGP was established to facilitate and transform Genizah
research by identifying, cataloging, transcribing, translating the
fragments, rendering them into digital format (i.e., photographing)
and publishing them online. (Genizah Project, Executive Summary,
n.d.)
FGP hired digital photographers with state of the art digital
cameras to produce 600 dpi color images of the 35,000 fragments in
The Library. They funded staff to prepare the material for digitization,
work with the digital photographers, re-shelve the material and
document the work. The photographers shot images for about four
months. All the fragments were shot with targets that included their
local identification number. The digital images were delivered in
"dng" format for the repository copy and in .jpg format for access.
They are available, with registration, on the FGP site. JTS holds an
internal access copy but has not made it public because of the limited
metadata accompanying the image.
Lessons learned from this first partnership project: In order to
successfully complete a major digitization project one must seize an
opportunity. This was an outstanding opportunity to digitize the
35,000 Genizah fragment. The project was driven by the funder. They
determined the items to be digitized, the method of digitization, the
choice of photographer, the rate of digitization, etc. There was no
specific budget for the completion of this work nor was there a
A Case Study at The Library of The Jewish Theological Seminary 165
designated time-frame. Work commenced and continued at a swift
pace until it was completed. The funder's representative wanted to be
involved in The Library's internal workflow. Often work was rushed
and not adequately planned. The most significant issue is that there is
no metadata for the Genizah fragments and the material is not freely
available. Only scholars who register, and are accepted by, the FGP
may search the information.
Library "treasures": There was another funder who selected
items which he was interested in digitizing, supplied the photographer
(who was approved by The Library) and delivered digital images in
TIFF format. While this project was funder driven, it supplied The
Library with images of some of its greatest manuscripts such as the
Rothschild Mahzor (MS 8892) from Florence, 1490. A separate site,
Special Treasures from The Library of The Jewish Theological
Seminary (http://jtslibrarytreasures.org/) with turning the page
technology was launched. Only 50 of the 450 leaves are on display on
this site. The site also includes other digitized manuscripts from the
collection, which are available with full metadata, on The Library's
digital library in 2008.
Rare book digitization: A project to digitize very rare books,
incunabula and 16th century books, was completed during the fall of
2009. This project reaped the benefits of our experience. As this was
not a comprehensive project, books needed to be selected, posing a
more complicated workflow. The goal was to digitize only unique
books that had not been digitized by other libraries. Once they were
identified, the condition of the book was checked with guidelines from
the paper conservators. The books were digitized on a Zeutschel
OS12000 book scanner in black and white. Each book needed to have
a wide enough opening so that the image could be shot. A ruler was
placed on each page. All the books were cataloged and the file name
for the digital image was the ALEPH (our ILS) system number for the
book. The bibliographic records were retrieved from ALEPH.
Hebrewbooks.org, the organization that was performing the
digitization, delivered the digital images in tiff and pdf formats. We
loaded the pdf files to our server and are now in the process of loading
them to digital library. We exported the metadata and delivered it to
166 A Case Study at The Library of The Jewish Theological Seminary
Hebrewbooks.org as they are loading the pdf files and metadata to
their collection of full text religious books in Hebrew which to data
total more than 40,000 titles.
Lessons learned: We took far more control over the steps of this
project. The external funder did however chose the equipment used,
we would have preferred to digitize the books in color, utilizing a
digital camera. At the beginning of the project the digitization
technician was trained by The Library's paper conservator in order to
ensure careful handling of each book. Greater control over the choice
of materials, the way the materials were handled and preparation of
the metadata made this a successful project. During November 2009
92 rare books were digitized.
Outsourcing
We have no capacity to digitize sound at JTS. We have digitized
two musical collections working with two external sources. One
collection was digitized by a private sound engineer and the other
project went to the lab at the Center for Jewish History, a non-for-
profit organization that had until our request, not digitized for other
institutions. This proved to be beneficial for them as it provided them
with some income during a "down" period in their digitization lab.
Plans are underway to send an archival collection, comprising
volumes of diaries, to the studio of a digital photographer that we have
worked with on numerous occasions.
The output of the digitization of the 252 Solomon Rosowsky Field
Recordings collection was in .mp3 format. We requested an archival
output. This resulted in a raw and sometime difficult to hear output.
The metatdata for this collection, created in an Excel spreadsheet, was
created at the time of digitization and is not part of the catalog as they
are a subsection of an archival collection.
The quality of the sound the Hallegua collection, digitized in
2009, proved to be far more successful. Partnering with the Center for
Jewish History and utilizing their digital lab we accepted their
recommendations and created archival .wav copies and access .mp3
copies. The quality of the work completed in the second project is
A Case Study at The Library of The Jewish Theological Seminary 167
clearer than the first project. Metadata is yet to be created for this
collection. Once it is available we will make the collection accessible.
Library initiated project:
When we have the luxury of planning and implementing a digital
project on our own terms we have learned that this is the ideal
situation for us. The digitization of manuscripts from a pre-selected
list has been the most successful project, from a digitization point of
view, that we have embarked on. We selected important manuscripts
that had no surrogates. The processes of the project were reviewed by
a project team who included: paper conservators, conservation
assistant, curator, special collections reading room staff, systems and
digitization librarian, heads of technical services and special
collections and director of library services as chair. The goal was to
look at the entire scenario of bringing in a digital photographer to
shoot 4,000 manuscript leaves over a two-week period. We planned
the space for digitization, the spreadsheet of targets, the physical
targets, workflow for pulling items and reshelving them, assistance for
the digital photographer, metadata for the manuscripts, and plan for
bringing up the digital objects in the digital library.
The planning meeting raised questions from all the stakeholders.
If there were any outstanding questions, all stakeholders were later
updated on the answer or resolution. Meeting with the entire group
enable everyone involved to know exactly who had a stake in the
project. It allayed any fears and anxieties. Bringing the team together
enabled a smooth process for the preparation of the materials for
digitization, the actual digitization and the preparation of the
metadata. The final piece, bringing the digital objects up in the digital
library posed other problems regarding staffing.
Challenges
Much has been accomplished but not without challenges. Hardware
and software capacity, staffing and funding are all issues that we
continue to grapple with.
168 A Case Study at The Library of The Jewish Theological Seminary
Capacity issues
The capacity of The Library's server that stores digital objects and
the capacity of the library's license for its DigiTool installation are
both limited. Issues related to capacity entail actual storage of the
digital images. Often one has different manifestations for a single
image or multiple images for a single document - for example, for a
five-page document that was shot as a RAW file there are TIFF
images, which in turn may generate JPEG files and also a PDF.
The number of version of the same document that one needs to
keep is determined by capacity. There is the repository server where
one stores the images. The number of manifestations that are saved
there may be determined by the institution's capacity. One must work
with those responsible for the institution's servers.
Staffing
Staffing is key to the success of each digital project. The key to
building a successful team for the digital library is for each team
member to be cognizant of their specific role in the project.
The team consists of curators who select materials for
digitization. The selections are determined by funding, importance,
fragility of the item, value of material, or by other criteria.
Once the items are selected they need to be assessed by the
paper conservators (for items on paper and parchment). They look
at the physical stability of the item. They can recommend if an item
can be digitized, with recommended equipment (scanner, book
scanner, photographer). They can also recommend if an item needs to
be encapsulated in Mylar before digitization, and/ or if an item needs
to have a conservation assistant working with the photographer
throughout the shoot.
Special collections staff is responsible for pulling and shelving
books. They need a complete and accurate list of the items to be
scanned. For a longer-term project, they need to prepare materials
each day to be digitized and materials that need to be reshelved. If the
project is based on a list of items found in a certain collection, they
may often need to search for the item.
A Case Study at The Library of The Jewish Theological Seminary 169
Digital photographer or scanner. Generally, this person is
not part of The Library's staff, but is an outside contractor or a
temporary worker hired to do the specific work. They need to be
integrated into the working environment of The Library, which may
include the physical location of the equipment, the rules and hours of
the special collection reading room.
Technical assistant to assist the photographer with turning
pages, preparing the shoot, and other needs surrounding the shoot.
Cataloger and metadata librarian needs to review the
current metadata for the object and make appropriate adjustment to
the metadata for the digital library based on the standards
established. Some of the metadata added include genre, collection,
and location of digital object. In some cases the metadata needs to be
upgraded, but this is subject to great debate amongst library staff. In
most cases it depends on the amount of existing metatdata and the
object in hand.
Systems and digitization librarian - Needs to export
metadata and deliver it to the cataloger and metadata librarian and
needs to prepare the materials for ingest into the digital library.
Since the existence of the JTS digital library, in January 2008, we
have had three systems and digitization librarians. Because of the staff
turnover, it has been challenging to complete many of our digital
projects and make them available online. Since October 2009, with a
new person in this position, new workflows are being documented and
the actual ingest of digital objects is being completed by other
members of the staff under the guidance of the systems and
digitization librarian, rather than by the person holding that position.
The position has evolved, with the initial incumbent successfully
building and launching the digital library. The digital library was
launched and there were initial documents added to the collection,
there was no documentation of the processes used which left the
second person in a difficult situation. There was little activity in
loading objects during that year. With the third person hired in the fall
of 2009 it was clear that the first step was to create workflows and to
clearly document them for others to follow and for The Library to have
170 A Case Study at The Library of The Jewish Theological Seminary
in order to successfully maintain the digital library. Lessons learned:
stop and document what you are doing and share the documentation
and the work.
From Digital Projects to a Digital Program
As we move into the second decade of digitization these are some of
the strategic directions that we need to address.
Assessment: With funding from METRO, The Library will assess
the effectiveness of its digital collections in DigiTool. A survey is
currently being built and will be posted on the site and sent out to a
wide group of constituents. The plan is to implement
recommendations from the survey, add new collections to the site,
most notably the Mordecai Kaplan diaries, and then survey the
effectiveness at the end of the grant period. We hope that this
assessment will help guide us in our work.
Documentation: We are documenting the workflows that we are
establishing for the creation of digital collections. With the
documentation, and with adequate funding, we will be able to hire
staff to work on building the digital collections.
Digital preservation: In the winter of 2008 we participated in the
Sustaining Digital Preservation workshop funded by the IMLS. We
also attended a METRO meeting in the fall of 2008 focused on digital
preservation. Conversations have begun with ExLibris about their new
digital preservation software. We need first and foremost to obtain the
commitment of the JTS administration to invest in digital
preservation. This is challenging, considering the serious nature of the
institution's finances in wake of the economic downturn in the US.
Little work has been done at JTS to address these needs.
New projects, improving workflow, increasing capacity,
increasing staff to enrich with metadata are all necessary aspects of a
successful digitization program. Our hope is that with creativity,
imagination, thoughtfulness and good luck, organizations drawn to
the uniqueness and richness of our collections will continue funding
The Library of The Jewish Theological Seminary in this important
work.
A Case Study at The Library of The Jewish Theological Seminary 171
References
About DSpace. (n.d.). Retrieved March 9, 2010 from
http : / / www. dspace.org/ about-dspace/introducing/
Clareson,Tom and Kim, Richard. (2006). Digital Program Survey
Final Report from April 5, 2006 Visit.
Dublin Core metadata initiative. (2010). Retrieved March 30, 2010,
from http:/ / dublincore.org
ExLibris DigiTool. (n.d.) Retrieved March 9, 2010 from
http:/ /www.exlibrisgroup.com/ category/DigiToolOverview
Genizah project, executive summary. ( n.d.) Retrieved March 30,
2010, from http://www.genizah.org/about-
Executive_Summary.htm.
Google Books. (2010). Retrieved March 13, 2010 from
http://books.google.com/
Greenstone digital library software, (n.d.). Retrieved March 9, 2010
from http://www.greenstone.org/
Lacinak, Chris and O'Neill, Kevin. (2008). Sound Recording
Consultation Final Report, June 26, 2008
Metadata object description schema. (2010). Retrieved December 11,
2009, from the official web site
http : / / www.loc.gov/ standards / mods /
Strategic Plan. (2005). The Library of The Jewish Theological
Seminary. 2005.
The NINCH guide to good practice in the digital representation and
management of cultural heritage materials. (2002). Retrieved
May 9, 2010 from
http://www.nyu.edu/its/humanities/ninchguide/
Local Objects, Local People, Local History:
Creating the Wisconsin Decorative Arts
Database
Emily Pfotenhauer (Wisconsin Heritage Online, Wisconsin
Historical Society and the Chipstone Foundation)
Abstract
The Wisconsin Decorative Arts Database is a digital collection of
three-dimensional artifacts from the collections of historical societies
and museums throughout Wisconsin, hosted by the Wisconsin
Historical Society and supported by the Chipstone Foundation. Since
2006, the project has documented nearly 1,000 examples of furniture,
ceramics, textiles and other decorative arts made by early Wisconsin
craftspeople and held in the collections of over 40 institutions
throughout the state. This case study examines the genesis of the
project, the photographic standards and metadata specifications
established for object documentation, and the unique challenges of
developing a diverse digital collection of museum artifacts from a wide
variety of local and regional institutions.
Keywords: Artifacts, Collaboration, Historical societies, Local
history, Metadata, Museums, Photography
In the past decade, the work of state- and regionally-based digitization
programs across the country has resulted in an explosion of local
history resources available online. These initiatives are collaborative
efforts among libraries, archives, and museums to make their
172
Local Objects, Local People, Local History 173
collections freely available to a broad audience of students, teachers,
historians, and genealogists. Some of the earliest and most influential
of these programs include the Maine Historical Society's Maine
Memory Network, the Minnesota Digital Library, and the Colorado
Digitization Project (now known as the Collaborative Digitization
Program at the Bibliographic Center for Research). For the most part,
programs such as these have focused on what some museum
professionals refer to as "the flat stuff: photographs, books,
manuscripts, and other two-dimensional materials. This focus on the
digitization of archival materials belies the fact that three-dimensional
artifacts form the core of the collections of most local historical
societies and museums. Moreover, the everyday objects people create
and consume (for example, furniture, clothing, and tools) are
considered by many scholars to offer significant evidence for historic
research, as much if not more so than historic photographs or original
manuscripts. Yet most institutions participating in collaborative
digitization programs have not made their collections of three-
dimensional objects available online at the same rate as their two-
dimensional holdings.
Case studies of statewide and national digitization programs
reveal the challenges common to most collaborative digital endeavors,
regardless of the types of material being digitized. Roberto (2008)
describes the lack of consistent standards in place for describing
collections among even the largest and best-staffed museums in Great
Britain. When collaborations are expanded to include small, local
historical societies and museums, the creation of consistent
collections data becomes even more challenging due to a lack of
available staff, resources, and expertise (Rowe & Barnicoat, 2009).
These obstacles are compounded when working with three-
dimensional objects, which require more space, equipment, and
technical expertise to photograph and more research and subject
expertise to catalog.
This chapter examines the development of one digital collection
of three-dimensional objects from multiple institutions: the Wisconsin
Decorative Arts Database (http://content.wisconsinhistory.org/
decorativearts). This project was initiated in 2006 by the Wisconsin
174 Local Objects, Local People, Local History
Historical Society and the Chipstone Foundation of Milwaukee, a
private foundation for the study of American decorative arts and
material culture. Chipstone and the Society had three goals in mind at
the outset of the project: l) to bring to light objects held in the
collections of Wisconsin historical societies and museums and share
them with a broader audience, 2) to document examples of furniture,
ceramics, textiles, and other decorative arts made by nineteenth and
early twentieth-century Wisconsin craftspeople in order to reveal
settlement patterns and the persistence of handicraft traditions in the
state, and 3) to add the first three-dimensional artifacts to Wisconsin
Heritage Online (http://www.wisconsinheritage.org), a research
portal that brings together a diverse range of digital collections from
Wisconsin libraries, archives, museums, and historic sites.
The Wisconsin Decorative Arts Database is founded on the work
of a multitude of "real world" individuals— the staff and volunteers at
local historical societies and small museums throughout Wisconsin
who work to preserve and share their collections, as well as the men
and women who built the chairs, stitched the quilts, and threw the
pots that are recorded in the database. In addition, the story of the
Wisconsin Decorative Arts Database is a highly personal "real world"
story for me, as the project grew out of my interest in the material
culture of my home state.
The seeds were planted in 2005 while I was a graduate student
studying American furniture and decorative arts in the Department of
Art History at the University of Wisconsin-Madison. The time had
come for me to choose a subject for my master's thesis research, and I
wanted the opportunity to study objects hands-on and up close. I
decided to focus on furniture made in the community of Mineral
Point, an important early trade center in southwest Wisconsin— as
well as the town where I grew up. During the year I spent researching
and writing my thesis, I unearthed a surprising amount of locally-
made furniture still surviving in both public and private collections
and met numerous Mineral Point residents dedicated to preserving
their community's distinctive history.
Local Objects, Local People, Local History 175
This search for furniture made in Mineral Point caught the
attention of the Chipstone Foundation of Milwaukee. In 2006,
Chipstone approached the Wisconsin Historical Society with the idea
to create an expanded fieldwork program that would include
furniture, ceramics, textiles, and metalwork made by craftspeople
throughout Wisconsin between 1820 and 1920. From the beginning, a
central focus of both Chipstone and the Society was to make these
findings freely available to the public in digital form. Both institutions
were already leaders in the digitization of cultural heritage materials
— Chipstone with its Digital Library for the Decorative Arts and
Material Culture (http://digital.library.wisc. edu/i7ii.dl/DLDecArts),
developed in collaboration with the University of Wisconsin Digital
Collections Center, and the Society with a number of innovative
digitization efforts including the award-winning American Journeys
(http://www.americanjourneys.org) and Turning Points in Wisconsin
History (http://www. wisconsinhistory.org/turningpoints)— so a
digital database of museum collections was a natural extension of both
institution's missions.
I was appointed as Chipstone's Charles Hummel Fellow to
manage all aspects of the initiative— fieldwork, photography, research
and writing as well as the structure and organization of the digital
collection. As my background was in art history, not digital
collections, I had a great deal to tackle in order to create an effective
online resource. What kinds of materials would be included and how
would they be located? How would each object be imaged? How would
users search the database? What kinds of metadata should be
included? And what exactly was metadata, anyway?
Fortunately, a few pieces of the puzzle were already in place. The
Wisconsin Historical Society would host the collection in
CONTENTdm, a content management system already in use by the
Society's Library-Archives division. Wisconsin Heritage Online
(WHO), a developing statewide digitization program, would also
harvest the collection into its central web portal. WHO's Metadata
Guidelines (2006) provided a useful introduction to creating and
organizing digital content, as did the Collaborative Digitization
Program's Dublin Core Metadata Best Practices (2006). However,
176 Local Objects, Local People, Local History
these guidelines were established specifically for images and text-
based materials, not three-dimensional artifacts.
Translating information about chairs, quilts, and pots into the
standardized fields recommended by existing guidelines was a bit like
trying to fit square pegs into round holes. Developing a crosswalk or a
systematized data import process was impossible, because no two
museums or local historical societies handle cataloging in the same
way. In fact, many institutions do not even use a digital collection
management system, instead relying on typed index cards or
handwritten ledgers to catalog their collections.
I developed a standard set of metadata elements based on WHO
recommendations, mapped them to the fields defined by the Dublin
Core (http://dublincore.org/documents/dcmi-terms), and assigned
customized local field names. For example, the Dublin Core Creator
element became a local "Maker" field, while Dublin Core's
Format.medium became a field called "Materials and Techniques." In
addition to providing images and physical descriptions of each
documented object, a central focus of the project was to place each
artifact in its historical context by conducting research on the
biographies of makers and owners as well as design influences and
cultural meanings. This goal called for the creation of multiple Dublin
Core Description fields: one for physical description, one for historical
information, and one for the citation of research sources.
Another key cataloging decision was the selection of a controlled
vocabulary appropriate to the database content. Because it is built into
PastPerfect Museum Software, a popular content management
system, The Revised Nomenclature for Museum Cataloging, aka
Chenhall's Nomenclature (Blackaby, Greeno, and the Nomenclature
Committee, 1995) is used by many local historical societies and small
museums in Wisconsin. However, the Getty Research Institute's Art
and Architecture Thesaurus (http://www.getty/edu/research/
conducting_research/vocabularies/aat) provides more nuanced and
specific terms for decorative arts objects. A significant level of
granularity is required to accurately assign object names and subject
headings in the database. Using Chenhall's Nomenclature, a chair can
Local Objects, Local People, Local History 177
be described primarily according to function, e.g. "Chair, Dining" or
"Chair, Side." The AAT allows for more precise descriptions of form
and style. For example, a chair can be distinguished as a bow-back
Windsor chair, a fan-back Windsor chair, or a sack-back Windsor
chair. While the differences may appear minute to a general audience,
these kinds of distinctions are essential to the work of decorative arts
scholars and collectors. — A revised and updated edition of Chenhall's
Nomenclature, Nomenclature 3.0, offering a greatly expanded
vocabulary for the description of museum artifacts, was released in
late 2009. Future subject headings provided in Wisconsin Decorative
Arts Database metadata may incorporate terms from this new edition.
One major question remained before any fieldwork could begin:
how to create quality digital images of objects that not only varied
widely in size and format but were scattered across the state? Just as
few institutions could offer detailed digital catalog records for their
collections, few had the resources or expertise available to image their
collections extensively, if at all. In addition, as with its metadata
guidelines, the WHO Digital Imaging Guidelines (2006) addressed
the scanning of two-dimensional materials almost exclusively and
offered little support for photographing three-dimensional objects.
Much of the available documentation on imaging museum artifacts,
such as Stanford University's Digital Michelangelo Project (Levoy &
Garcia-Molina, 2000) and the work of the Graphics Lab at the
University of Southern California's Institute for Creative Technologies
(Hawkins, Cohen, & Debevec, 2001) centered on the creation of
complex three-dimensional digital renderings using multiple cameras
and laser scanning.
It was necessary to develop a simple, portable imaging approach
so that photography could be completed quickly (to adapt to the
limited schedules of volunteers and busy staff) and unobtrusively (to
accommodate the often cramped storage quarters of small
institutions). I arrived at my photographic process through a
combination of consultation with professional photographers and on-
site trial and error. Jim Wildeman, a photographer based in Madison,
provided a basic introduction to artifact photography. Selecting the
right equipment for the job was essential. The Chipstone Foundation
178 Local Objects, Local People, Local History
funded the purchase of a Canon Rebel XTi digital SLR camera as well
as a tripod, a remote shutter release, a neutral gray paper backdrop
and backdrop stand, and two stationary tungsten lights on adjustable
stands. Indispensable incidentals included a long extension cord, tape
measures (both fabric and metal), white cotton gloves for handling
items, and cloths and brushes for removing dust from artifacts. The
entire photography kit could be packed into the trunk of a car and
quickly set up on location. Participating institutions need only to
provide an electrical outlet and suitable working space.
After the framework of the digital collection was laid out and
procedures for photographing and cataloging artifacts were
determined, on-site work could begin. Based on my experience with
local historic sites and individual collectors in Mineral Point, I knew
that putting out a general call for material via an email listserv would
yield minimal results. It was much more effective to contact each
potential contributor directly, usually with an introductory email or
letter and then a follow-up phone call to discuss possible objects
before scheduling an in-person visit. Once on site, objects were
selected for documentation based on the availability of documented
provenance, oral history associated with the item, or stylistic
similarities to other known objects.
I began my fieldwork with two familiar sites from my graduate
research: the Mineral Point Historical Society and Pendarvis, the
Wisconsin state historic site in Mineral Point. Starting with these sites
gave me the opportunity to refine my documentation techniques in a
comfortable environment. I could take as long as I needed to make
minute adjustments in lighting or re-measure the dimensions of a
chair or flowerpot. I connected with the next three participants, all
mid-sized regional museums, via word-of-mouth recommendations:
the Neville Public Museum of Brown County in Green Bay, the
Chippewa Valley Museum in Eau Claire, and the Sheboygan County
Historical Society.
After working with this handful of pilot institutions, it was clear
that the phrase "decorative arts," while a common academic term, was
something of a stumbling block for potential participants. Some
Local Objects, Local People, Local History 179
possible contributors read the phrase too narrowly, expressing
concern that the objects in their collections were not "decorative"
enough, while others interpreted it too broadly, hoping to include
items manufactured outside of Wisconsin but used locally or objects
that lacked any known history whatsoever. At the same time, I was
discovering many artifacts that did not fit my original parameters but
were of compelling historical significance.
Although the title of the database remained the same, the
selection criteria were adapted to accommodate objects that offered
important evidence of Wisconsin craft practices and industrial
development, even if they did not fit into an academic definition of the
decorative arts. For example, the staff of the Historic Blooming Grove
Historical Society in Madison contacted me about their collection of
materials from the Frank J. Hess and Sons Cooperage, one of the last
manufactories in the nation to produce hand-hewn white oak beer
barrels. While not "decorative arts" in the traditional sense of the
term, the Hess barrels reveal a fascinating story of an immigrant
craftsman who brought a traditional European craft practice to
Wisconsin, adapted it to the local environment, and passed it on to a
second generation (Holland, 1966).
Although I received a number of word-of-mouth
recommendations and direct contacts, most database participants
were located through the Wisconsin Historical Society's Directory of
Wisconsin Local History Organizations (http://www.
wisconsinhistory.org/localhistory/directory). The Society's local
affiliates include more than 350 county historical societies, local
historical societies, community museums and historic house
museums. The vast majority of these institutions are run by volunteers
or a tiny cohort of part-time staff. Many are open to the public
seasonally and for limited hours, sometimes as little as one or two
weekends each month. This meant that arranging access to collections
hinged on flexible scheduling and the willingness of staff and
volunteers to open their doors during non-public hours.
The fieldwork process, an in-person visit to select, photograph,
and document individual objects, is the most rewarding, yet also most
180 Local Objects, Local People, Local History
exhausting, part of the project. From September 2006 through June
2009, 1 logged over 10,000 miles traveling to sites across the state. I
sifted through all manner of collection storage facilities, some
meticulously organized and others packed to the gills with artifacts
piled on shelves and stuffed in boxes. I handled thousands of artifacts
and toured all kinds of historic buildings. Most excitingly, I met
dozens of people who were passionate about their work, dedicated to
preserving the history of their communities, and eager to share their
collections with new audiences.
Depending on the number of items selected for documentation, a
site visit can last from a few hours up to two or three full days.
Preparing the material to post online takes much longer. A series of
graduate and undergraduate interns from the Material Culture
Program and the School of Library and Information Studies at the
University of Wisconsin-Madison have helped support this phase of
the project. Basic editing procedures, including cropping, sharpening,
and color balancing, are performed on each image using Adobe
Photoshop or ACDSee. Images and corresponding metadata are
uploaded directly to the Wisconsin Historical Society's CONTENTdm
server.
A central mission of the database is to establish an historic
narrative for each artifact by investigating the history of the object's
maker and owner as well as its cultural significance and formal
influences. Due to the myriad of primary source documents and other
historic material now available online, a major portion of this research
has been conducted solely in the digital realm. The most useful
sources for biographical and genealogical information proved to be
the Wisconsin Historical Society's Genealogy Index (http://
www.wisconsinhistory.org/vitalrecords), offering pre-1907 birth,
death, and marriage records for the state of Wisconsin, and the full-
text-searchable Wisconsin County Histories (http://www.
wisconsinhistory.org/ wch). The commercial website Ancestry.com
was also valuable for searching federal census records.
Participating institutions are required to sign a Memorandum of
Understanding that authorizes the Wisconsin Decorative Arts
Local Objects, Local People, Local History 181
Database to photograph objects and publish those photographs and
associated metadata online. Each institution receives an archival-
quality CD containing the image files and retains full rights to those
images. (A second CD, along with any paperwork generated during the
research process, is archived at the Wisconsin Historical Society and
the digital images are backed up on the Society's servers.) For the
most part, participants have readily agreed to these terms. However, a
few have expressed concern about making material from their
collections freely available online, citing either a desire to retain
control over digital images or a fear of theft of the physical collection.
In most cases, these fears were abated with the argument that the
benefits of participation— particularly the increased opportunity for
exposure and publicity— outweigh the risks.
Participation offers historical societies and museums the
important benefit of an increased online presence. Most participating
institutions maintain their own websites, but many lack the resources
to fill these sites with substantive information about their collections.
To increase online exposure, each contributor is clearly identified at
several points in the database. The metadata for each catalog entry
includes an "Owner" field as well as a rights statement with a link to
the institution's own website. The opening page offers users the option
to browse the database by selecting a specific participant's collection.
Another page functions as a directory of content contributors, with an
image and description of each institution as well as a link to their own
website.
Two popular free web features— a blog hosted by WordPress and a
gallery on the photo-sharing site Flickr— supplement the database and
extend its presence online. I use the blog, Wisconsin Object
(http://wisconsinobject.wordpress.com), to chronicle my travels and
announce when new content has been uploaded to the database. It
also provides a venue for more lengthy examinations of selected
artifacts and makers. The Wisconsin Decorative Arts Flickr gallery
(http://www.flickr.com/photos/wisconsindecarts), which is linked
from the blog, presents examples of recently uploaded database
content. While not numerous, user comments on both the blog and
the image gallery have provided an important window into the
182 Local Objects, Local People, Local History
audiences for this material and the types of objects they find most
compelling. For example, the most-viewed image on Flickr is a
colorful beaded bandolier bag made by Great Lakes Indians in the late
nineteenth century, now in the collection of the Wisconsin Historical
Museum. A corresponding blog entry on the cultural significance of
bandolier bags is one of the most popular posts. Users who have
commented on the blog or added the image to their own Flickr
galleries include craft hobbyists and collectors— two major audiences
that were not considered at the outset of the project.
The Wisconsin Decorative Arts Database's regional approach to
material culture research is nothing new. Since the 1970s, researchers
throughout the United States have worked to document decorative
arts made locally and held in local collections. Most notably, the
Museum of Early Southern Decorative Arts (MESDA) in Winston-
Salem, North Carolina established a fieldwork program in the 1970s
and 1980s, supported by the National Endowment for the
Humanities, to locate artifacts made by craftspeople working in seven
southern states before 1820 (Niven, 2001). This research yielded a
vast paper-based archive of photographs and files that now serves as
an indispensable resource for scholars studying the material culture of
the American South.
While modeled on the work of MESDA and other statewide
fieldwork initiatives, the Wisconsin Decorative Arts Database is the
first program of its kind to be created entirely in digital form. An
online database of regional artifacts offers a number of advantages
over a paper archive or a published catalog. New content can be added
continually, revisions can be made in response to new research, and
related materials can be brought together via hyperlinks. Moreover,
making collections available digitally helps small, local cultural
heritage institutions remain relevant to students, historians, and other
audiences whose research is grounded in the ready availability of
online resources.
Since 2006, nearly 1,000 catalog entries have been added to the
Wisconsin Decorative Arts Database, representing artifacts from the
collections of 40 historical societies and museums throughout
Local Objects, Local People, Local History 183
Wisconsin. In September 2009, the database was recognized with an
Award of Merit from the American Association for State and Local
History. With financial support committed by the Chipstone
Foundation and the Kaufman Americana Foundation through June
2011, the program will continue to unearth important evidence of
early craft production in Wisconsin. In addition to adding new
content, explorations are underway to develop creative ways to
expand and promote the project, including both online and bricks-
and-mortar exhibitions and publications.
References
Blackaby, J. R., Greeno, P., &The Nomenclature Committee. (1995).
The revised nomenclature for museum cataloging: A revised and
expanded version of Robert G. Chenhall's system for classifying
man-made objects. Walnut Creek, CA: AltaMira Press.
Bourcier, P., Rogers, R., &The Nomenclature Committee. (2009).
Nomenclature 3.0 for museum cataloging: Third edition of Robert
G. Chenhall's system for classifying man-made objects. Walnut
Creek, CA: AltaMira Press.
Collaborative Digitization Program Metadata Working Group. (2006).
Dublin Core metadata best practices (version 2.1.1). Retrieved
from http : / / www.bcr.org/ dps/ cdp
Hawkins, T., Cohen, J., & Debevec, P. (2001). A photometric approach
to digitizing cultural artifacts. In Proceedings of the 2001
Conference on Virtual Reality, Archeology, and Cultural Heritage,
333-342. New York: Association for Computing Machinery, doi:
10.1145/584993.585053
Holland, R. (1966). The last wooden beer barrels. Wisconsin Tales
and Trails 7, 12-16.
Levoy, M., & Garcia-Molina, H. (2000). Creating digital archives of
3D artworks [White paper submitted to the National Science
Foundation's Digital Libraries Initiative]. Retrieved from
http : / / graphics . stanf or d. edu /pr oj ects/dli/ white-pap er/ dli .html
184 Local Objects, Local People, Local History
Miller, S. J. (Ed.)- (2006). Wisconsin Heritage Online metadata
guidelines (version 2.0). Retrieved from
https://wiheritage.pbworks.com
Niven, P. (2001). Frank L. Horton and the roads to MESDA. Journal
of Early Southern Decorative Arts, 27, 1-147.
Roberto, F. (2008). Exploring museum collections on-line: The
quantitative method. In J. Trant and D. Bearman (Eds.), Museums
and the Web 2008: Proceedings. Toronto: Archives & Museum
Informatics. Retrieved from http://www.archimuse.com/
mw20o8/papers/roberto/roberto.html
Rowe, P., & Barnicoat, W. (2009). NZMuseums: Showcasing the
collections of all New Zealand museums. In J. Trant and D.
Bearman (Eds.), Museums and the Web 2009: Proceedings.
Toronto: Archives & Museum Informatics. Retrieved from
http : / / www.archimuse.com/mw2009/ papers / rowe / rowe.html
Digitizing the Rare Book Collection of the
Leo Baeck Institute
Renate Evers (Leo Baeck Institute, New York)
Abstract
The article focuses on typical problems encountered during the
establishment of a digitization program for rare books at a small
special library - from finding the best solutions for rare book cradles
to organizational and technical challenges and pitfalls.
Keywords: Digitization, Judaica Collection, Rare Book Collection,
Special Library
The Library of the Leo Baeck Institute:
A Medium-sized collection
The Library of the Leo Baeck Institute (LBI) in New York is
internationally recognized as the most comprehensive repository for
books documenting the history and culture of German-speaking
Jewry. Over 80,000 volumes and 900 periodical titles provide
important primary and secondary material. Most of the collection
deals with central European Jewry during the nineteenth and early
twentieth centuries. It also includes material dating back as far as the
16th century and is as current as the Jewish population in Germany
today. The focus of our collection is on the diverse culture of German-
speaking Jewry, especially in the arts, sciences, literature, philosophy,
and religion. The Institute was established in New York in 1955, one of
the largest centers of the exiled German-Jewish community. In 2000
185
186 Digitizing the Rare Book Collection of the Leo Baeck Institute
the Leo Baeck Institute became part of the Center for Jewish History,
an umbrella organization for 5 Jewish research institutions.
The Rare Book Collection of the Leo
Baeck Institute
The Rare Book Collection of the Leo Baeck Institute (LBI) consists of
ca. 2,000 volumes primarily in the field of German Judaica, dating
from the earliest period of printing in the 15th century and continuing
through the annihilation of European Jewry under the Third Reich.
Rich in rarities ranging from early 16th century writings to Moses
Mendelssohn and Heinrich Heine, first editions and dedication copies
of works by more recent prominent writers, many of its volumes were
salvaged from famous Jewish libraries that were confiscated and
dispersed by the Nazis. In many cases these primary sources cannot
be found anywhere else in the world. An important part of the rare
book collection is a collection of illustrated and art books. Among
them are many limited editions of twentieth century artists' portfolios
and several illustrated eighteenth century books on Jewish customs.
The Institute also has a significant collection of books dealing with
"Science of Judaism" (Wissenschaft des Judentums). The emergence
and establishment of modern Jewish scholarship in the 19th and 20th
century in Germany and Europe became the cornerstone and
predecessor of Jewish Studies.
Project: Digitizing the Rare Book Collection of the
Leo Baeck Institute:
The goal is to digitize the rare book collection in order to make these
exceptional primary sources easily available to researchers on and
offsite, and to minimize use of the originals. To begin, the focus is on
titles which are in highest demand.
We started a pilot project with a selection of 38 books in 2005
when the Leo Baeck Institute was awarded one of the first grants in
the newly established Metro Digitization Grants Program. The
experience gained in this project enabled us to continuously enlarge
our digital rare book collection. Working closely with scholars in the
Digitizing the Rare Book Collection of the Leo Baeck Institute 187
field and using feedback from our users in order to identify a
meaningful selection, so far we have successfully digitized 167 rare
books (about 22,000 pages) from our rare books collection. Among
those books are 33 illustrated and art books which were funded by
another Metro Grant awarded in 2008. By now we have worked out
the technical details and workflows for our digitization program and
are identifying new funding sources in order to continue the process.
Newly digitized books are continually being added to our digital
management system and made available on the Internet.
Future projects: in 2008 the Archives at the Leo Baeck Institute
embarked on an interesting new project - "DigiBaeck," which is
destined to put all LBI archival holdings onto the World Wide Web in
close cooperation with the Internet Archive. In this framework we
consider to digitize parts of our book collection as well. Besides, we
are collaborating with other libraries and have applied for shared
digitization projects.
Web interface and Examples
The digitized books are accessible through the web interface of our
digital management system Digitool (Ex Libris the Bridge to
Knowledge, 2010). The books can be retrieved via a search interface
or can be browsed by various categories. Searchable are the
bibliographical data and the full-text, most of the books underwent
OCR (optical character recognition).
Additionally we reference the digital objects in our online library
catalog records by adding the URL in MARC field 856 (see 856 -
Electronic Location and Access, 2003.)
Challenges
This chapter focuses on the challenges faced and solved mainly during
the implementation of LBI's digitization program in 2005/2006 as
well as during follow-up projects. The problems were typical for
projects which engage in new technologies, ranging from technical to
organizational hurdles. Some of the difficulties which we encountered
were connected to the special medium rare books.
188 Digitizing the Rare Book Collection of the Leo Baeck Institute
Challenge 1: Outsourcing a Digitizing Project
Choosing the right vendor, developing a set of criteria for finding
the right vendor including testing of sample files, negotiating a
contract, monitoring and testing the results proved to be a very
challenging endeavor.
During the selection process for a digitization vendor, we
developed the following set of criteria and asked vendors to submit a
proposal with information and price quotes for the following areas:
• Master files (TIFF) 300, 400, 600 ppi in bi-tonal, grayscale
and color
• Derivatives (jpeg, jpeg 2000) 300, 400, 600 ppi in bi-tonal,
grayscale and color
• Scanning oversized materials, foldouts
• Technical and structural metadata
• File naming
• Cameras used
• Book cradle
• File storage and delivery
• OCR (Optical character recognition)
• Insurance
• Security (handling of books, security of lab area, book tracking
system, etc.)
• Shipping and delivery
• Quality control
• Other costs
• References
We provided vendors with samples from our collection and asked
them to provide us with scanned images and metadata. The scanned
images underwent a quality check, the metadata was checked against
standard recommendations. The best thing to do would have been to
load the sample files into our digital management system, but at the
time of our vendor negotiations our digital management system was
not yet fully implemented.
Digitizing the Rare Book Collection of the Leo Baeck Institute 189
Overall we compared the vendor proposals and the samples very
thoroughly, especially the price quotes for different scanning
scenarios, the setups of the scanning labs, the quality of the scanned
images, and the vendors' ability to produce structural metadata for
our digital management system. The determining factors were the
book cradle solutions and security features (e.g.: handling of books,
security of lab area), the quality of the scanned images, the ability to
provide complex metadata as well as the price differences.
During the negotiation process we encountered variations of
important technical details (e.g. file types, file naming, cameras used,
technical and structural metadata, OCR) which helped us to make
decisions which we then incorporated in the final contract. The
contract spelled out our decisions in the listed areas. We learned that
it is very advisable that the contract describes as many technical
details as possible.
Challenge 2: Medium Rare Bound Books from Previous
Centuries
One of the biggest challenge to digitize rare books from previous
centuries is the fact that flatbed scanning is usually not an option
since bindings cannot be removed for the scanning process. Rare
books in our collection are usually treated as artifacts, the content is
as important as the bindings and physical appearances. Older books
are often very tightly bound and cannot be opened 180 degrees, it is
very common that they can only be opened 90 to 120 degrees.
Many of the scanning services which we approached were only
equipped for flatbed scanning and declined to take on projects with
bound books. Commercial solutions explicitly for rare book cradles do
not seem to exist. Those few vendors we encountered which also scan
bound books usually built or customized their own solutions for rare
book cradles. Some used commercial book cradles for modern books
which can be opened 180 degrees and added features to make them
suitable for books which cannot be opened 180 degrees. Others
vendors constructed adjustable boards to hold the books in place
without applying pressure during the scanning process.
190 Digitizing the Rare Book Collection of the Leo Baeck Institute
The Grass Lipper Digital Laboratory at the Center for Jewish
History designed and uses two book cradles, one cradle has a fixed 90
degree angle, the other one a 120 degree angle. The cradles are made
of acrylic and covered with black velvet. Velcro straps keep the book in
place. The use of 2 "fixed" angle cradles turned out to be a simple, but
successful solution.
The different book cradles which we encountered represent
different solutions for accommodating books which can only be
opened at an angle less than 180 degrees. It is hard to make a general
recommendation for a best solution. Depending on the type of books
and condition of their bindings some cradles are better or less suited
for the task of supporting fragile bindings and spines and of exerting
as little pressure on the books as possible. For some books it is
possible to use glass plates, but glass plates put too much pressure on
most of the books. It is a trade-off - if glass plates cannot be used it
very often results in shadows which often cannot be removed through
the adjustment of the lightning. Overall those solutions which had a
support for both front and back bindings were better suited for our
specific collection.
Rare books require additional manual handling time in securing
the volume after each page is turned which makes the process more
costly than scanning regular books.
Challenge 3: Technical and Structural Metadata for Compound
Objects:
Not all scanning vendors are capable to provide the necessary
technical and structural metadata for compound objects. During our
search for a scanning service in 2005/2006 we realized that
digitization vendors came from different backgrounds -
photographers, microfilming companies, library services,
bookbinders, etc. - and were thus often not aware of or capable of
adjusting to emerging library best practices and standards. Many of
them were able to deal with single digital objects and equipped to
deliver the technical metadata, but not able to provide ready-to-go
structural metadata in order to build complex digital objects.
Digitizing the Rare Book Collection of the Leo Baeck Institute 191
Our digital management system DigiTool (Ex Libris) requires the
use of METS (Metadata Encoding & Transmission Standard, 2010) for
storing structural metadata along with descriptive, administrative,
technical and other kinds of metadata in an XML "wrapper." During
our vendor search in 2005/2006 we found out that at that time many
vendors were not aware of or capable to provide this type of metadata
with the scanned images. If the scanning process is outsourced it is
very advisable that the vendor provides structural metadata at the
same time, otherwise it becomes a very tedious and time-consuming
endeavor to match the digital objects to a structure map afterwards.
During the testing phase we asked the vendors to supply us with
metadata, preferably METS. During our initial project we could only
find one vendor who could provide us with METS files. During the
first phase of our project we additionally discovered that we needed a
translation code between the METS ALTO version that the vendor
could create and the METS ALTO version that our digital
management system understood.
Challenge 4: Organizational Parameters & Limitations of
Digital Management Systems
We had the "problem" that we had to use a digital management
system which was chosen by our umbrella organization and which was
not fully implemented at the time when we started our digitization
program. The challenge was to bridge shortcomings of the system and
shortcomings of what digitization vendors could provide, a typical
problem when a technological field has not yet developed mature and
commonly accepted standardization guidelines. The first versions of
our digital management system did not accept structural METS files
in METS ALTO standard, so we had to find additional funding for
developing a translation code between METS ALTO (what Vendors
were able to produce) and the METS "dialect" that our system
understood, which resulted in higher costs.
Organizational set-ups and the choice of the technical system
have a great influence on the direction and the outcome of a project.
These factors cannot be underestimated and they influence the
timeline and the financial framework of a digitization project.
192 Digitizing the Rare Book Collection of the Leo Baeck Institute
Challenge 5: Keeping up with Evolving Digital standards
One of the challenges we faced in follow-up projects was to keep
up with evolving and maturing guidelines for digitization. One of the
lessons learned was that it is necessary to spell out technical
requirements in detail in a contract and to make sure that they are
applied, and not to take them for granted. General formulations can
lead to ambiguous interpretations.
Example l: An area of dispute in one of our project was if it is
"allowed" to enhance or sharpen master files during the scanning
process. Many labs follow the NARA guidelines (see National
Archives, 2004) and do not apply ANY sharpening or other
"enhancements" during capture or to master files. If errors are
detected during the capture, then the camera is calibrated. One of the
vendors enhanced the master files instead of calibrating the camera.
Our contract read: "Quality of image: Files will be inspected for
sharpness, image contract, density and faithfulness to original and
improved if necessary." - that should have only be allowed for
derivatives, and not for master files. The master files should be true to
the original materials; colors and contrast of the digital files should
represent accurately the original pages. Many of the master images
had too much contrast (visible in bad histograms) which probably
came from sharpening during the scanning. In that respect our
contract was not specific enough, we did not specify that only
derivatives could be altered, but not master files.
Example 2: It proved to be necessary in our 2008 project to
spell out in the contract to check regularly if the calibration of the
camera results in an accurate image. In our contract we had specified
that the images produced should be "faithful to the original"
materials. The colors and contrast of the digital files should represent
the original pages accurately.
Our first vendor calibrated the camera only once at the beginning
of the scanning process to the "default" NARA guidelines. The results
were aggravating: the majority of the image files had extensive color
bleeding, a severe image quality problem. Ultimately we had to rescan
our books at another digitization facility.
Digitizing the Rare Book Collection of the Leo Baeck Institute 193
Our contract was sufficient in the sense that we could rightfully
claim that the bad image results were a breach of contract. But we had
to rescan all our books - something which is definitely not advisable
for rare books. A line in the contract about periodic camera calibration
could have prevented this dilemma. We should have also done quality
checks periodically.
Lesson learned: The contract texts for projects in emerging
technological fields are often vague and only give a narrative
description of the wanted outcome since various schools of thought
exist in terms of technical specifications.
In many cases a narrative formulation in the contract about the
desired outcome is the only possible way to address an area of
concern, but in some cases it is not sufficient. The more hardcore
technical details can be specified in a not ambiguous way the better. It
is also very advisable to control the project frequently, to do several
test phases, and not to rely on past vendor performances.
Conclusion
The successful implementation of our digitization program was and is
based on a combination of persistence, serious initial testing, vendor
comparison, ongoing testing of results, questioning of vendors,
collecting of technical information, balancing of outside factors,
changing of vendors when the outcome did not meet our expectations.
The most challenging aspect of our rare book digitization project has
been the steep learning curve of getting into the technical intricacies
of creating digital content, and balancing outside factors to get the
project on its way and moving. The most rewarding aspect has been to
make hitherto relatively unaccessible books accessible to a broader
audience.
An important lesson learned was that is necessary to keep up with
evolving and maturing guidelines for digitization - and to make sure
that they are really applied, and not to take them for granted. Even
established workflows have to be reevaluated from time to time, since
the field changes all the time, and the technical details become more
194 Digitizing the Rare Book Collection of the Leo Baeck Institute
detailed and sophisticated. Digitizing of rare books is a very manual,
labor, and ultimately cost intense work.
References
856 - Electronic Location and Access. (2003). Retrieved May 9, 2010
from http://www.loc.gov/marc/bibliographic/bd856.html
ExLibris the bridge to knowledge, Overview. (2010). Retrieved
March 15, 2010, from
http://www.exlibrisgroup.com/category/DigiToolOverview
Metadata Encoding & Transmission Standard. (2010). Retrieved May
9, 2010 from http://www.loc.gov/standards/mets/
National Archives. (2004). Technical guidelines for digitizing
archival materials for electronic access: Creation of production
master Files - raster images. Retrieved May 9, 2010 from
http://www.archives.gov/preservation/technical/guidelines.html
The Caprons of Paris: A Digitization
Project in a Small Library System
Misty De Meo (County of Brant Public Library)
Abstract
The County of Brant Public Library set out with the goal of
documenting the history of the founder of the town of Paris by
digitizing a collection of his original papers. By building contacts
within the community, the Library was able to successfully complete
its initiative and open new avenues for future projects. The equipment
and digitization methods used are described, with a special emphasis
on the methods in which the Library was able to circumvent its small
budget. The Library's complementary local history wiki, and the
project's methods of promotion, are also described.
Keywords: Digital collection, Digitization, Historical society, Local
history, Marketing, Web 2.0, Wiki
Introduction
One of the key goals of the County of Brant Public Library is to act as a
gateway providing the most accessible routes to information. The
County of Brant Public Library Digital Collections was envisioned as a
means by which local historical information which was currently
unknown or inaccessible to most members of the community could be
made easily and publicly available. The goal of this project was not
merely to provide information, but to provide the means by which the
community could interact with history and share information.
195
196 The Caprons of Paris
The County of Brant, located in southern Ontario, is a diverse
collection of unique communities, each with its own history. It has a
modest population of 34,415 divided among 30 distinct communities,
formerly separate municipal entities, spread out over 843 km 2
(Statistics Canada, 2006). The Library has five branches spread
throughout the County. It was impossible for a single project to
document the County as a whole. This digitization project was
designed as the first part of an ongoing digitization program which
would eventually represent the entire "community of communities"
that makes up the County. The short-term goal was to document the
history of the town of Paris, the largest community in the County of
Brant, through its founder Hiram Capron; the long-term goal was to
build an ongoing program documenting all of the communities in the
County, using the Capron project as a method to open doors and
create momentum.
The keys to the success of this project have been technical
flexibility and innovation, which enabled success within a modest
budget, and community engagement, which provided the support,
contacts, and materials necessary to grow a small project into a pair of
ongoing initiatives.
Project selection
The Library evaluated a number of potential project partners when
planning this project, from communities across the County.
Unfortunately, for various reasons, we were not able to work with all
of the local historical societies we approached.
Most importantly, we needed a critical mass of original materials
- a body of materials large enough to form a hypothesis as the basis of
a project. Local history is exponential - pursuing a sufficiently large
collection of materials leads to new collections and new potential
partners. While many potential projects had seemed promising, a
number proved to be unsuited to this project on examination; some
collections were small and lacked that force behind them which we felt
was necessary to give us an "in" to the community, while others
proved to contain many inauthentic items and reproductions which
The Caprons of Paris 197
were unsuited to this project, which was focused on original primary
historical documents.
An ideal project was eventually found in the Paris Museum and
Historical Society, whose archives hold a large body of original
documents. Hiram Capron, the town's founder, is an important figure
in the history of Paris and is almost unusual in that his legacy is by
and large authentically supported by the documents the museum
holds. The museum was willing to consider a partnership, and
provided the Library with open access to its Capron collection. At the
time of the agreement, approximately 50 items were identified for
digitization in the project. The "critical mass" criteria proved apt: at
the time of writing (March 2010) the Library has digitized a collection
of over 425 items, provided by the museum and by new contributors,
as a part of the Hiram Capron project and further projects. By
beginning with the founder of Paris, we have been able to broaden the
perspective to provide a wide variety of materials about the town's
history and to forge partnerships with new donors.
Several historical societies have expressed concern of retaining
ownership and control over their items. One of the necessary keys to
developing partnerships was to show potential project partners that
digitization could enhance the value of their collections and their
services, rather than replace them. Our partners at the Paris Museum
and Historical Society have reported that their visits have increased
since the launch of the digital collections websites.
Management and staffing
The project involves three key members of the Library. The Library's
CEO, Gay Kozak Selby, originated the project and secures funding
from outside sources; she performed initial project planning, research,
and hiring with the library's e-resource librarian, Christine
MacArthur. Christine was also responsible for day-to-day
management and co-ordination of promotion. Scanning, technical
management, and design of promotional materials was the job of the
professional archivist, Misty De Meo.
198 The Caprons of Paris
In addition, the project has made use of shorter-term student
workers in order to handle particular extra tasks and features. In the
summer of 2009, the Library employed a university student using
funding from Young Canada Works to provide transcriptions of a large
number of handwritten documents in the Capron collection in order
to make the full text searchable online. Other projects of this nature
are anticipated in the future.
Funding
Funding was provided through the Library Strategic Development
Fund, an ongoing grant operated by the Ontario Ministry of Tourism
and Culture with the goal of assisting innovative projects and
community development. The Library was one of the recipients for the
2008-2009 grant, receiving $18,405. This grant was critical to the
success of the project; it could not have gone forward without external
funding. It provided for equipment costs, the hiring of a professional
archivist to supervise and perform digitization, and some of the
project's operating costs. The total operating budget was $35,000,
with the remainder of the budget paid by the Library. Summer student
work has been funded though Young Canada Works, a program
operated by Canadian Heritage. The Library's primary website was
provided by a grant from Knowledge Ontario.
Content Management System
A number of factors influenced the selection of the content
management system used by the host for the digital collection. These
included ease of use and intuitiveness of user interface; advanced user
interface features; searchability; and accessibility through external
search services, such as Google. While the Library holds a license to
use SirsiDynix Hyperion, we felt that it was insufficient for our needs
after evaluating a neighboring institution which had used it (King
Township, 2008). Its interface is somewhat cumbersome and
simplistic; we decided that its "look" would turn off a large number of
visitors. It also lacks advanced features for object descriptions or
supplementary viewing options, such as contextual maps or
The Caprons of Paris 199
"zoomable" image views. Perhaps most importantly, its results are not
available through Google - we recognize that most of our potential
visitors will be using tools other than our own built-in search engine
to discover content, and so leaving our content out of Google
needlessly keeps our content away from interested visitors.
In the end we selected a website toolkit called VITA created by the
Our Ontario division of Knowledge Ontario, a local organization which
provides a variety of digital services and content. Our Ontario, which
focuses on providing access to digitized historical materials, provides
grants to small organizations and so we were able to obtain use of the
software and hosting free of charge.
Our visitors have generally been impressed with the advanced
features VITA makes available. One of the most popular features is its
Google Maps integration. Each item can be tagged with a set of
coordinates that will cause a Google Maps widget to display below the
item. We have used this extensively with items such as historical
maps, which allows us to display a modern image of the town centered
on the same location as the map. Another popular feature is the
integrated "Zoomify" software, which allows the user to zoom in to an
oversized item at greater levels of detail, or to pan and rotate. An item
which demonstrates both of these features is available at
http://images.ourontario.ca/brant/details.asp?ID=68322
VITA is also designed around modern Web 2.0 search methods;
consequently, in addition to its own built-in search, it makes all of its
content available via Google and permits material access and sharing
through RSS and social networking sites (Knowledge Ontario, 2009).
Thanks to this, our VITA site features prominently on the first page of
Google results for key terms such as "Hiram Capron". Our site makes
significant use of user-interaction features, encouraging users to
comment and contribute information. Many items contain "mystery
questions" inviting users to submit information that is missing about
an item.
200 The Caprons of Paris
Our Brant: local history, local voices
The project, as originally planned, comprised only one digital history
site displaying primary documents. Since then, we have made the
decision to launch a second local history website at the same time as
our primary site, currently available at http://ourbrant.wikia.com/ As
materials and contributions for our primary site were being prepared,
it became clear that a great deal of information would not be
appropriate for inclusion in the primary site. As is almost certainly the
case with most established communities, much of the history of Paris
has by this time become a matter of tradition which is not entirely
substantiated by the remaining documentation.
The goal of the primary Digital Collections site was to provide
original historical documents, photographs and other items in a
digital format, both for the purposes of historical research and to
allow people to learn about the town's history. For this second
audience, it was especially important to provide peripheral materials
which slotted each item into the "narrative" of historical events in
order to build understanding, which was achieved both through item
descriptions and through explanatory photo essays. However, because
the site is focused on the documents and historical evidence more
than the story, it has been considered very important that the
narrative presented is not at odds with the historical documents and
that it does not make assertions which are not supported by the
documents. We were further interested in enhancing interaction with
our users and clients; we felt that the Web 2.0 style interactive
features in the VITA website software was one of its greatest
strengths, and we looked for further opportunities to enable this kind
of interaction within our communities.
As a result, it was decided to add a second site to the project. The
goal was to provide an appropriate location for people from the
County to document the area's history as they personally understood
it. With a focus less on hard historical facts and records, it would
provide an appropriate place to record these popular anecdotes of
local history. Hence, we defined the primary goal of this site as
The Caprons of Paris 201
providing a place for anyone from the county to share their personal
histories.
Based on this goal, we decided to provide spaces for the following
types of content: (1) Personal memories and life stories; (2) Family
histories; and (3) Profiles of notable local residents, buildings, and
organizations.
The wiki format was judged to be most appropriate for this
application due to the ease with which it facilitates collaboration and
open contributions with a relatively low barrier of entry for new users.
The wiki format was also judged appropriate due to the abundance of
services which provide free hosting, such as Wikia
(http://www.wikia.com/), as this would eliminate a financial barrier
to the creation of the site.
While we have found other local history wikis during the planning
process, we have not found another site with the same focus and
consequently much of Our Brant's design is entirely original. Many
sites we have found focus on locations as grounding points for
memory, such as Placeography (http://www.placeography .org/) and
Zurbu (http://zurbu.net/), while a few others, such as the Wagga
Wagga Local History wiki (http://waggalocalhistory. wetpaint.com/),
focus on events. Furthermore, a number of these wikis are not open to
the public for editing, such as the Alexandrina Local History Wiki
(http://alexhistory.pbworks.com/) or the Montana History Wiki
(http://montanahistorywiki.pbworks.com/). Our Brant differentiates
itself from these other services by providing open access for editing to
any registered or anonymous users, and by placing a strong focus on
individual history and personal memory rather than using the anchors
of locations and subjects.
Input Methods
Ease of editing was considered to be one of the most important
features. Many of the residents in Paris and the County who are
interested in sharing local history are not very comfortable with
computers. Despite the familiarity of Wikipedia, it is not simple to
create pages without using a special markup language (Baker, Hoover
& Rose, 2009); given our probable audience, this was not considered
202 The Caprons of Paris
acceptable. While Wikia includes a rich text editor which abstracts
markup language from the user on individual pages ("Rich Text,"
2009), it does not automate elements such as page structure or link
structure. We chose to adopt the forms-based input system used by
Baker et al. on Placeography, which presents the user with a simple set
of input boxes and checkmarks using software called Semantic Forms.
Once submitted, the result is an attractive, professional-looking page
without requiring the user to employ any markup language.
Another benefit of the Semantic Forms software, and the
Semantic MediaWiki software on which it is based, is the ability to
perform queries on the information entered in form fields (Semantic
MediaWiki, 2009). Our Brant uses this extensively to build the index
pages which allow users to browse the site's content. Because these
indices are based on queries, they update in real time to include newly
added content; this has simplified page creation by eliminating the
need for users to create links to their pages (e.g., see Figures PARIS -1
to PARIS-3).
The community pages provide the central browsing interface to
the site's content, using queries to aggregate together all content
which is marked as belonging to this community. Each community
page also contains links to the forms for creating new pages, and each
individual page contains an "edit with form" link which allows it to be
altered using the same form interface.
Participation
Baker, Hoover and Sherman (2009) note the importance of
building a community in order to create a successful wiki, and we took
their lessons to heart when preparing Our Brant. We approached local
community members about contributing to the site prior to its launch;
these included the president of the historical society we partnered
with as well as stakeholders in the area's history, such as the owner of
the historic Asa Wolverton House in Paris. In addition, we drew on
our own resources and digitized transcripts of oral history interviews
that the Library had conducted in the 1970s and 1980s. This allowed
us to present the site at launch with a significant amount of content to
attract users and to provide examples of what they could contribute.
The Caprons of Paris 203
ujikia
Create a new wiki
Edit with form f Edit this page ^History ©Delete BlMove 11 Protect v' = Unwatch C-Refresh
Edit Place: Asa Wolverton House
Address: |52 Grand River St, S.
COUNTY OF BRANT
^uhlic T ilirarv
L UU11\. LiiUlUl J
Located in: | Paris _jj
Built in: |l851
Renovated in:
| Search Our Brant
Our Brant
Communities >
Explore the wiki >
Original function: | House _jj
Current (unction: Bed and breakfast
Photos: To upload s photo related to (his place, click "add another" below. Click here for help with adding photos.
Contribute )
Memories )
Photo: |Asa Wolverton portrait.jp cj Upload image
Families >
Caption: [Portrait of AsaWolverton with miniature
Places >
Mew pictures
Remove |
Site Community >
^/ Create a new article
Jj Upload .1 new image
Recent changes Random page
Help Special pages
Photo: jAsa_WolvertonJiouse_door.jpg Upload image
Caption: Plantation shutlerfront door
Remove |
Latest activity 1
Photo: lAsa_Wolverton_house_sketch.jpg Upload image
8*1 articles on this wiki
Caption: Larger copy of sketch
& Asa Wolverton House
3 seconds ago by ■ %derneo
County of Brant
20 hours ago by Mistyderneo
if OurBraii1:Tu1o rials
3 days ago by Mistyderneo
Remove |
Add another |
Complete description of this place: xss. Wolverton emigrated to Paris in the 1830 3 from Cayugi
a sawmil 1 soon after: his arr ival . This enab led him to de;
Tim]^p- r.lip 1-T.rn np i nVihnur i nn hniifip^ t.n r.lip Pinur.h nnrl nnir.p
Figure PARIS-1 Sample form for creating a page for a building
Asa Wolverton House
T tiers are fivf original firs places in trie hoo: e low lain ■■ a isceiitlj j king filer rhe kitcliei here wae connecter! 10 the dining room Uni*ntfi.in<:.ti-M) Eeo ana isreakfasi
Memories about Asa Wolverton House
Figure PARIS-2 The result page
204 The Caprons of Paris
Memories
People
■ John Anderson
■ Syl Apps (hockey player and
■ Mrs. Ernie Buck, Marge McCorkindale, Mr. Be Mrs.
Conservative MP)
Andy Leishman, Alice & Andy Scott
■ Hiram Capron (founder of Paris)
■ Edith Carnie
■ Walter Capron (Hiram Capron's nephew)
■ Tom Cocker
■ Alma Duncan (artist)
■ Clara Farr
■ Ron Eddy (current mayor of the county)
■ Gord Gibson, councillor
■ Hugh Finlayson (first mayor of Paris and
■ Eleanor Hardie
tanner)
■ Bob Hasler, president of the Paris Museum and
■ Norman Hamilton (prominent early
Historical Society
businessman)
■ Margaret Laing, librarian
■ John Penman (owner of Penman's Mills)
■ Norma Leighfield, reeve
■ Ted Reader (television chef)
■ Muriel and Jack Pickell, owners of the Paris Star
■ Jay Wells (hockey player)
■ Corydon C. Randall
■ Robert West (boat renter, entertainer,
■ Jackie Remus
and eccentric)
■ Kay Riddolls, nurse
■ William West (poet and owner of the
• DaleRobb
Nith Navigation Co. boat rental
■ Foster Scott, teacher
company)
■ Mel Sharpe, mayor
■ HriaTloc TAThit"]aw f Trillion ann tti a ^rf^T^
■ ^11^1 ^^IIILIQVV gunnel aii^-i Lll&.y*jL J
■ D.A. and Isobel Smith
■ Paul Wickson (artist)
■ James Thomson, Penmans employee, and wife
Click here to add a person if you don't see
■ Police Chief Watts
someone on the list.
■ Shirley Williams, from Qua family
■ Leonard Wise, owner of Harry Wise Menswear
Places
Click here to add your own memory about this
community!
Churches
■ Paris Baptist Church
Family histories
■ Paris Presbyterian Church
■ Patton family, owners of Elton Cottage
■ Sacred Heart Roman Catholic Church
^ t j. _i: rii -l.
Figure PARIS-3 The browse index for Paris. (Shortened, as the complete list is too
long to be shown.) The lists of memories and of people are generated by queries and
are updated automatically whenever a new page is created.
This proved to be a success; when examining our statistics, we
found that visitors had discovered our site even before the launch and
were sharing links to our content online (Qua, 2009). In addition to
complete content, we have also seeded the site with a large number of
"stub" pages - short articles with only a minimum of content, which
contain invitations to the reader to complete them.
The Caprons of Paris 205
In order to help promote usage of the site to an audience that may
not have access to the internet, we have also started a program to
accept site contributions on paper; this ensures that those who do not
own computers or who do not feel comfortable with computers can
still contribute to the site.
Digitization standards
Our digitization standards were based on those recommended by
a number of sources, primarily the U.S. National Archives and
Records Administration (2004), and the Canadian Museum of
Civilization (2007). Due to the size of our organization we were not
able to completely adhere to these conditions; very specific standards
pertaining to digitization room design and monitor calibration are not
realistically achievable using the budgets of most small organizations.
Our standards include both a web display format and a preservation
format, which is meant to remain accessible and usable in the long
term.
Preservation masters
Display copies
Format
8-bit TIFF (from scanner)
DNG camera raw (from
camera)
8-bit JPEG
Resolution
600dpi (from scanner)
Minimum 1000 pixels along
longest end; higher when
necessary
Colour space
sRGB IEC61 966-2.1
sRGB IEC61 966-2.1
We selected the Digital Negative (DNG) format as a master
format for camera images instead of TIFF because of the increased
flexibility it offers (Adobe, 2009). While proprietary "raw" formats are
archivally unsound, because of their closed specifications, DNG offers
a manufacturer-neutral format for this data. Adobe has made the
specification publicly available, which ensures that it can be supported
by future software. There is no archival consensus on the use of DNG
as a preservation format (Hess, 2009), but we feel that its benefits
outweigh any risks. When storing preservation copies on media
206 The Caprons of Paris
without significant limitations on storage space, such as the County's
central hard drives, we also store uncompressed TIFF copies of
camera images.
Our standard workflow is as follows:
• Import image from scanner or camera raw into Photoshop as
16-bit RGB, using the sRGB profile
• Perform any necessary colour correction and cropping
• Convert image to 8-bit colour depth
• Enlarge image by 0.5 inches or 1 inch
• Type item number in lower right corner, and partner
information (if appropriate) in lower left corner
• Save preservation TIFF
• Resize item to display size and perform sharpening
• Save web JPEG
Books
A special note should be made of our standards in digitizing
books. Many digitization projects, such as Google Books, use
computer processing to flatten pages, remove bindings, and recolour
pages to a bitonal, or pure black and white, format. We chose to use a
different presentation method. This digitization project is primarily
archival in nature, and most of the bound material selected was
chosen for its archival qualities as much as its informational qualities.
A guiding rule of archives is that context is as important as content in
reading meaning from an item (Van Ballegooie & Duff, 2001), so our
standards are designed in order to ensure that the context of a book's
usage is evident in our digitized version.
A good example of this is our digitized copy of Frederic A.
Holden's 1859 Genealogy ofBanfield Capron, a family history of the
Caprons. The copy available to us was Hiram Capron's personal copy,
which was later owned by a succession of other Caprons who have
made extensive annotations and additions to the book.
The Caprons of Paris 207
C A P K 0 N FAMILY.
107
[901]
Sarah M. Capron, daughter of Otis and Polly, was bom
August 22, 1796, and married Samuel Boyden, July 19, 1824.
CHILDREN. — VI. GENERATION.
977 Samuel Boyden, Jr. ; born May 28, 1825 ; died young.
978 Sarah Curtis; born August 7, 1827.
979 Martha Maria ; born November 11, 1829.
980 Samuel, Jr. ; born April 28, 1881.
981 Benjamin Franklin ; born September 4, 1833.
982 Mary Elizabeth ; born May 24, 1835.
983 Edmund Capron ; born September 1, 1838.
Samuel Boyden, Jr., sou of Samuel and Sarah, married
Ellen L. Morse, October 17, 1855.
Benjamin Franklin Boyden, son of Samuel and Sarah,
married Maria Louise Kingsbury, November 28, 1857.
Judith Capbon, daughter of Otis and Polly, was born Dee.
13, 1801. She married Nicholas Cook.
[ 980 ]
[981]
[ 903 ]
CHILDREN. — VI. GENERATION.
Digitized by
Google
Figure PARIS- 4 Page 107 from Google Books's edition
208 The Caprons of Paris
Cap i«i miLr.
107
[9*1]
tin
^S***lt M. OUMK, J- .^jbler of CHu and l'.JIr, ni born
I . Sliu^rt U12, LT'."i. al,J Hgnjixl grand BojJ^ Jidj 19, lflii
J it »V ■
I m»rrH=3
TSmjkis I'mmir Bemss. eon uf Samnsl ami 8u»li.
■umtx] Marii LmiiM Kingiib.rj, N<,™mh«, 2^ 1957.
jLijltu C*»usw. iiHRhlf-r <,f Oci* ind *u km Dm.
IS. 1801, t*« dhttM NieheJu C«oV.
CJIJL.D1LKX- VI. (iESERATTOS.
LmiiiA <'i»ik : msrriivl Alrin
HO ; nairriwl Jiraot lunik
Minia ; married Mary alartift.
«»T llanuh ; nitwl '<*■ Knwlnll.
»«* AI.Mlli . : uiarr.nl Lovull I'm k.-r! :i ■.
»a Xichata* Jr, *
JuJilh.
!
Property of Paris Museum
Figure PARIS-5: Our edition of this page, showing the original book's condition. The
page behind the attached card was also digitized separately
The Caprons of Paris 209
These annotations are as important for our purposes as the
original book. They identify some of its owners, and provide
contextual information on how this family history was used and why it
was important to them. For that reason, we have chosen to digitize
these archival books in full color with any annotations or additions,
maintaining the full size of the page and bindings.
Metadata
To simplify metadata collection given our resources, the Library
has focused on using the VITA software's metadata for our digitized
items. Metadata is entered at the time a record is prepared for display
in VITA, and is preserved by saving local copies in the VITA and
Dublin core formats. VITA's metadata is stored in an XML-based
format, which allows translation into other formats as needed in the
future. Metadata is standardized using a number of mandatory and
preferred fields, and supplemented with any additional relevant fields
per item.
Mandatory:
Preferred:
Contributing Partner
Citation
Copyright Status
Collection
Language of Description
Copyright Date
Media Type
Copyright Holder Name
Title
Copyright Statement
Creator Name
Date(s) of Original Donor
Earliest Year
Latest Year
Preservation
Both the preservation master copies and the web copies are
preserved, along with metadata and any OCR/transcription data. Two
physical copies are created on archival quality DVDs; one set is kept
210 The Caprons of Paris
by the Library, and the other by our project partner. In addition,
supplemental preservation copies are stored on the County's servers.
Equipment
The original project plan called primarily for the digitization of flat
documents, which consisted primarily of single pages or a small
number of pages, and the project's equipment and software was
purchased with these types of documents in mind. The original
equipment purchased for this project consisted of one computer
workstation, with Adobe Photoshop CS4 software; two scanners; and
general archival supplies.
Scanners
The Epson GT-20000 (Epson), the first scanner purchased, is
advertised primarily as a workgroup scanner, and its primary purpose
is scanning large size documents. It supports documents of a size up
to 11.7" x 17", at a resolution of 600dpi. It was purchased on the
recommendation of our partners at Our Ontario.
The V500 (Epson) is intended primarily as a photo scanner; it
was purchased as a secondary unit. It supports resolutions as high as
6400dpi, with documents up to 8.5" x 11.7" in size, and unlike the GT-
20000 it is able to scan photo negatives - this was the primary reason
it was chosen. Both scanners support colour depths of up to 48-bit
colour.
Book scanning
As the project progressed, new material came forward and it
became clear that bound books and book-shaped materials would also
need to be scanned. The majority of these items were Hiram Capron's
personal account ledgers dating back to 1828, and were very fragile. It
was clear that their brittle spines would not survive the bending
necessary to flatten the book for use in a flatbed scanner. In addition,
the natural curvature of ledger and book pages when laid out flat on a
scanner would produce sub-optimal images, especially when
performing OCR (Clements, 2009). Another concern was the scanning
of oversized objects, as many items in the collection included elements
The Caprons of Paris 211
which were larger than the 11.7" x 17" size that the largest scanner
could image.
We decided to use photographic imaging for these items, which
would free us from physically fitting items into a flatbed scanner.
While we were familiar with commercial book scanning cradles, such
as those produced by Atiz (Atiz) and Kirtas (Ristech), our equipment
budget for this project did not permit a purchase of this expense.
Consequently, we constructed our own simple cradle at a low cost.
The only supplies required for the cradle itself were:
• Two white foldout foam presentation display boards:
Presentation boards proved to be ideal because they are
segmented and naturally fold out. This allowed the longest ends
to be raised at a 90-110 0 angle from the surface of the table,
providing the surface for a V-shaped cradle.
• Weights to hold the boards in place: The boards need to be held
in place using physical weights of some sort, both to provide
support to hold the body up at an angle and to prevent slippage
from the weight of the book or artifact being digitized. Weights
were easily produced from salvaged supplies. Binders were
initially used, but these proved insufficiently heavy to guard
against heavy objects slipping. Boxes of books from storage were
ultimately the ideal solution, and certainly plentiful in any
library.
One camera, with an SD card: A discussion of the selection of the
camera is contained in the next section.
One tripod
One plate of glass to hold down pages (optional): The use of
glass permitted curling pages to be held flat while being
photographed. However, because a polarizing filter was not
available, glare prevention is a challenge. The plate of glass was
taken from an unneeded spare picture frame.
• Two large sheets of black paper, providing a scanning surface for
items to rest on
212 The Caprons of Paris
Figure PARIS-6: The book cradle.
The total cost of the supplies needed came to less than $700, and
many of the supplies were already available at the Library. The only
parts which needed to be newly purchased were the camera and its
memory card, at a cost of approximately CAD $550.
The photograph above depicts how the setup functions. The book
or artifact being digitized is placed in the centre of the cradle. Next to
it, the camera is placed on the tripod and angled facing down at the
page. Because only one camera was used, only the even or odd pages
are digitized at a time; after completing one pass, it is necessary to flip
the book and photograph the other pages. After photographing, the
The Caprons of Paris 213
images can be transferred to the computer for processing. The use of a
consistent angle and book positioning means that the images can be
automatically processed using Photoshop batch processes, so it is not
necessary to manually crop individual pages.
The cradle provided significant advantages when compared to
other options. It allowed books and ledgers to be digitized clearly and
legibly while providing pages which are flatter in appearance than
those scanned using a flatbed scanner. The use of a camera also meant
that capturing each individual image was significantly more efficient
than using a flatbed scanner, because camera imaging is much faster
than flatbed scanners. In addition, the use of presentation boards
meant that the size of documents could be very large - as large as 36"
x 24". The greater limit on item size was camera resolution. The use of
such a large surface meant that the cradle was also useful for digitizing
other oversize archival records which could not fit in the flatbed
scanners.
When selecting a camera for use with the cradle, a number of
criteria were considered. The criteria which guided our decision were:
Resolution. While most high-end digital cameras have more
than sufficient resolution to digitize average-sized books and items,
the collections being digitized included some very large objects
containing fine-grained detail such as a 19th century atlas that
measured 15" x 17.5". These items necessitated a camera with a very
high megapixel count in order to produce legible images.
Raw compatibility. The "raw format" is a type of "digital
negative" (Toborg, 2009), which allows cameras to record information
exactly as received by the sensors; this enables more advanced post-
processing (Canon, 2008). The most valuable feature this permits is
quick and accurate colour correction.
Cost. Any camera selected had to fit within the equipment budget
remaining for the project, which eliminated DSLR cameras of a high
resolution.
Compatibility with professional book digitization
equipment. If a future budget permitted the purchase of
professional book digitization equipment, it would be most beneficial
214 The Caprons of Paris
to be able to use the camera which had already been purchased in
order to reduce costs.
DSLR (Digital Single-Lens Reflex) cameras are popular choices
for book digitization (Torborg, 2009), particularly because of their
high imaging quality and higher dynamic range (Wan, 2008).
However, for this project another camera was selected which better
met our needs. In particular, the requirement of a high megapixel
count ruled out most entry-level DSLR cameras. Based on these
criteria, we selected the Canon PowerShot G10, at the time the
highest-end "prosumer" Canon PowerShot camera available. With its
14.7 megapixel sensor 1 (Canon, 2008), it was capable of capturing
sufficiently fine detail in large items such as the atlas. It also met the
other criteria because it supported the raw format. There were no
DSLR cameras within the budget available with a comparable
resolution; the resolution was considered an acceptable trade-off for
image quality and benchmarks indicate that, for the purpose of this
project, the G10 is within an acceptable quality range compared to
similar DSLRs (DxO Labs). It provides excellent image quality for
digitizing books and documents even in non-ideal lighting conditions.
Lessons learned from the cradle
While the cradle has been very successful for imaging documents
unsuited for flatbed scanners, determining the best shooting methods
has been an ongoing process and both materials and practices have
been amended as the project progresses. We have switched to a
carefully configured camera setup instead of its uncalibrated default
settings, and have improved the cradle's design.
1 The G10 has recently been replaced by the Canon PowerShot Gil, which has a
significantly reduced resolution of only 10 megapixels - a 33% reduction (Butler,
2009, p. 1). While the reduction in resolution was for the purpose of reducing the
amount of noise in images and increasing detail at lower resolutions (p. 17), the G10
performs better and produces more detail at the lowest film speed (pp. 11, 14, 19)
used in digitization. While the G10 was a suitable replacement for a DSLR for this
project, the Gl 1 would not be.
The Caprons of Paris 215
There have been some minor tweaks to its physical construction,
which have helped to significantly improve quality. The presentation
boards work very well as an inexpensive, readily available material.
White presentation boards were initially used because they were the
most readily available medium. However, the Canadian Museum of
Civilization (2007) recommends using black or neutral grey
backgrounds for photography. In our testing, we determined that
using a black background does produce superior results, with lower
noise levels and a superior contrast ratio.
More detailed accounts of our findings, and those of others, can
be found at the DIY Book Scanner website at
http://www.diybookscanner.org/ and at the archivist's personal blog,
located at http://www.mistydemeo.com/ .
Software processing
As Wan (2008) notes, the Gio's image tends to be noisy direct out
of the camera, and this is especially noticeable when the camera is
shooting in its JPEG mode. In our experience, noise can be minimized
and detail maximized by shooting using the raw format in well-lit
environments; using appropriate exposure settings; using the lowest
ISO ("film speed") level available; and by using Adobe's Camera Raw
software for noise removal (included with Photoshop) in place of the
Canon Digital Photo Professional software bundled with the camera.
As with the white balance settings, software noise correction
processing can be automated and hence does not adversely add time
to the processing workflow.
Promotion
Both of our sites have been promoted using a variety of methods,
including traditional methods such as print and through special
events, and through non-traditional online methods.
Online
Our passive advertising methods have been primarily focused
online and have centered around findability - ensuring that both of
our sites can be located using search engines such as Google, and in
216 The Caprons of Paris
other places where users may be looking for related information. Our
sites are linked and indexed in a variety of relevant locations,
including our primary library site and the sites of local historical
societies. These have helped to increase its Google ranking, ensuring
that key terms such as "Hiram Capron" and the names of individuals
for whom we have memories appear among the first search results. As
well, we have included links on highly trafficked online resources
which do not contribute to our Google ranking, such as the Wikipedia
articles for Hiram Capron, Paris, County of Brant, and others. Our
statistics show that, in the month of November, approximately 10% of
our visits came through Wikipedia and 20% came through search
engines.
Interlinking between our two sites has also helped to direct
visitors to our other materials; 25% of our visits on one site in the
month of November came through links from the other.
Press
Press releases were distributed to the local media at various
stages of the project, beginning shortly after the digitization work
began; a new press release was distributed every month or two
months. They provided ways of announcing the project and providing
the community with short updates on progress, and also provided
ways to communicate project events to community members. A week
prior to the project's official launch, we also purchased a set of
advertisements in all of the area newspapers.
Local community newspapers were the most amenable to working
with us. All of our press releases were printed by the local County
newspapers, and the community newspaper from Paris sent out a
photographer to document our Digitization Days event.
Unfortunately, obtaining coverage and advertising in the larger
newspaper from a bordering city proved more difficult; not all of our
press releases were printed, and advertising space was substantially
more expensive.
We have found that the rumors of the death of the newspaper are
greatly exaggerated. Print was one of our most effective advertising
methods. Through informal questioning, we determined that most of
The Caprons of Paris 217
those contacting us about the project had read about it through
newspapers. Our advertising space has also proven to be effective. Our
Google Analytics report on the site's first week available to the public
showed a substantial spike of traffic, representing our highest daily
visits to date, immediately after our newspaper advertisements were
printed.
Close to the time of our launch and after it, growing interest in the
project has enabled us to take part in local television and radio; this
has helped promote the project and generated further interest.
Advertising handouts
In addition to our print materials, we prepared a number of
advertising materials to distribute to our partners and directly to
community members.
To advertise our website and specific events, we designed three
eye-catching posters. Two are permanent posters, designed to
generate ongoing interesting the project, while one is a one-time
poster intended to advertise a specific event.
The permanent posters were created to advertise both of our
websites. The primary poster was intended as a project-neutral
advertisement for the site; consequently, it was designed to avoid
specific references to the Capron or Paris projects and image selection
was necessarily constrained to images which were not distinctly
Parisian. The second poster is intended to solicit contributions to the
Our Brant site, and specifically targets an audience which is less
familiar with computers. In both cases, we were able to draw from
photographs in the collection in order to obtain attractive design
elements which could generate further interest in the collection itself.
Due to the expense of having posters professionally printed, we
opted not to design complex posters for most one-time events such as
the Digitization Days event. However, we judged our launch to be
important enough to warrant a professionally printed poster. In
addition to relatively wide distribution to generate interest in the
launch event, it was especially valuable for the purpose of distribution
to the agencies which provided our grant.
218 The Caprons of Paris
The three posters were displayed in all of the branches of the
Library. In addition, they were also distributed widely to project
partners and other locations such as local genealogical societies,
museums, and businesses.
We also produced a set of promotional postcards to be distributed
directly to Library patrons. Both the front and backs of the cards
contain information on the websites with their URLs. The cards are
prominently displayed at the circulation desks of all Library branches
as well as our project partner, and clients are invited to take as many
as they would like.
Events
The Library has held two special events in order to help build
interest in the project. The first was a community event called
Digitization Days, which was held on November 13th and 14th, 2009.
It was conceived for the dual purposes of sounding out any additional
material for our project which might be held by community members,
and project promotion by digitizing individual families' photographs
and documents to build interest in digitization and our website.
During this event, community members could bring in their personal
photographs or documents to be scanned and, for a small fee, could
take home the scans on an archival CD.
Much to our surprise, we found that the majority of the
contributors were not interested in taking home any personal digital
copies of the photographs they had brought to be digitized. They had
brought their material specifically for the purpose allowing it to be
included on our website. In total there were eight contributors over
the two days of the event; of these, five brought material specifically
for our website, and one other was willing to allow his photographs to
be posted after its purpose was explained. A total of 22 new items
were obtained for our two websites during the event. The largest
number of contributions came from one family who brought 15
original historical land leases, 9 of which have been included on our
website.
This revealed a flaw in our planning. While volume of
contributions had been considered, it had seemed very unlikely that
The Caprons of Paris 219
there would be a surfeit of relevant materials for inclusion on our
website. Consequently, our planning had focused on limiting the
number of contributions per person. Fortunately, contributors were
evenly spread across the day; in the future, we will ensure that all
relevant material can be digitized in the event that the number of
contributors exceeds capacity.
The Digitization Days event proved to be a success. The most
valuable outcome was a new contact who owns a large untapped
collection of original historical documents.
Our project's launch event was held on December 6, 2009. We
invited members of the community to attend and see an introduction
to our two digital collections websites. As entertainment, we provided
a slide show with photographs from the collection. In addition, the
president of the historical society we partnered with came in full 19th
century costume in character as "Hiram Capron." We sent Victorian-
styled invitations to a wide variety of recipients, and attracted visits
from our Member of Provincial Parliament and the County's mayor.
Conclusion
While the two Digital Collections sites have now officially
launched, development is very much ongoing. The materials available
have acted as catalysts for community interest and involvement.
Based on the current interest the project has generated, the Library
has continued its work with the Paris Museum and Historical Society
for additional projects and partnered with new organizations to
document the history of other communities in the County. By
digitizing the wide body of materials described, and opening doors
within the community, the Capron project has helped lay the
foundation for a digitization program which can document the history
of all of the County's communities.
References
Adobe. (2009). Digital Negative (DNG). Retrieved Dec 8, 2009, from
http : / / www. adobe .com/products/ dng/
220 The Caprons of Paris
Atiz. (n.d.). Atiz BookSnap: It's a book ripper. Retrieved from
http : / / www. atiz. com /brochure /booksnap . pdf
Baker, T., et al. (2009). Collaborative history - Creating (and
fostering) a wiki community. In J. Trant and D. Bearman (eds).
Museums and the Web 2009: Proceedings. Toronto: Archives &
Museum Informatics. Retrieved from
http://www.archimuse.com/mw2009/papers/baker/baker.html
Butler, Richard. (2009, December 16). Canon PowerShot G11 Review.
Digital Photography Review. Retrieved Dec 16, 2009, from
http : / / www. dpreview. com / reviews / canongn/
Canadian Museum of Civilization Corporation. (2007). Digitization
Standards for the CMCC: Scan and Artifact Photography.
Retrieved May 17, 2009, from
http://www.chin.gc.ca/ATutor/bounce.php?course=29
Canon. (2008). PowerShot G10 Camera User Guide. Japan: Canon.
Clements, Maureen. (2009). The secret of Google's book scanning
machine revealed. As a Matter of Fact Blog: NPR. Retrieved from
http : / / www. npr.org/blogs/library/2009/04/
the_granting_of_patent_75o8978.html?sc=fb&cc=fp
DxO Labs. (n.d.). Compare cameras. DxO Mark. Retrieved Dec 11,
2009, from http://www.dxomark.com/index.php/eng/Image-
Quality-Database/ Compare-cameras/ (appareili)/247| 0/
(apparent) /334 1 0/ (apparent)/ 3 19 1 0/ (onglet) /o/ (brand) / Canon/ (
brand2)/ Canon/ (brand3)/ Canon
Epson. (2009). Epson GT-20000. Retrieved Nov 3, 2009, from
http://www.epson.com/cgi-bin/Store/consumer/consDetail.jsp?
BV_UseBVCookie=yes&oid=63075503
Epson. (2009). Perfection V500 Photo Color Scanner. Retrieved Nov
3, 2009, from http://www.epson.eom/cmc_upload/o/ooo/142/
325 /V500_InfoSheet.pdf
Hess, Richard L. (2009). Re: NEF images. Aug 13, 2009. ARCAN-L.
http://www.mailman.srv.ualberta.ca/mailman/private/arcan-
l/2009-August/oo6i36.html
The Caprons of Paris 221
Holden, Frederic A. (1859). Genealogy oftheDescendents ofBanfield
Capron. Retrieved Dec 8, 2009, from
http://books.google.ca/books?id=qLHTscb_XEwC
Holden, Frederic A. (1859). Genealogy oftheDescendents ofBanfield
Capron. Boston: Geo. C. Rand & Avery. Paris Museum and
Historical Society, 2005.0044.02.
King Township Public Library. (2008). Timeless king online.
Retrieved Dec 14, 2009, from http://www.king-
library.on.ca/heritage.php
Knowledge Ontario. (2009). VITA [3.3] user manual (Version 1.0).
Retrieved from http://our-ontario-contributor-network.near-
time.net/files/vita3-3_usermanual.pdf
Qua, George. (2009, October 12). Qua family name. Ulster ancestry
genealogy & ancestry forums. Retrieved Dec 10, 2009, from
http : / / www. ulster ancestry . com/ forums/ viewtopic. php ?t= 28 o 1
Ristech. (2009). Kirtas APT BookScan 1200. Ristech. Retrieved Nov
3, 2009, from http://www.ristech.ca/kirtas-apt-bookscan-
1200.html
Semantic MediaWiki. (2009). Help:Inline queries. Retrieved Dec 1,
2009, from http://semantic-mediawiki.org/wiki/
Help : Inline_queries
Statistics Canada. (2006). 2006 community profiles - Brant.
Retrieved Dec 14, 2009, from http://wwwi2.statcan.ca/census-
recensement/20o6/dp-pd/prof/92-59i/details/page.cfm?
Lang=E&Geoi=CSD&Codei=3529005&Geo2=PR&Code2=35&Dat
a=Count&SearchText=brant&SearchType=Begins&SearchPR=oi&
Bi=All&Custom=
Torborg, Wayne. (2008). Manuscript digitization at the Hill Museum
& Manuscript Library: Building on the legacy of microfilm.
Microform & Imaging Review. 37, 1, pp. 17-27, ISSN (Print) 0949-
5770, DOI: I0.i5i5/mfir.2008.002, Winter 2008
U.S. National Archives and Records Administration. (2004).
Technical guidelines for digitizing archival materials for
electronic access: Creation of production master files - raster
222 The Caprons of Paris
images. Retrieved from http://www.archives.gov/preservation/
technical/ guidelines.html
Van Ballegooie, Marlene, and Duff, Wendy M. (2001). RAD revealed:
A basic primer to the rules for archival description. Ottawa:
Canadian Council of Archives.
Wan, Don. (2008). Canon PowerShot G10 review. Digital
Photography Review. Retrieved Dec 2, 2009, from
http : / / www. dpr eview. com/ reviews / CanonGio/
Wikia. (2009). Help:Rich text editor. Wikia. Retrieved Nov 27, 2009,
from http://help.wikia.eom/wiki/Help:Rich_text_editor
Zurbu. (2009). Local history and development forums on Zurbu.
Retrieved Dec 4, 2009, from http://zurbu.net/
The Mass. Memories Road Show: A State-
Wide Scanning Project
Joanne Riley and Heather Cole
(University of Massachusetts Boston)
Abstract
Running a state-wide digital history project on a shoestring budget
and staffed primarily by volunteers is not only possible, but brings
immeasurable rewards for the contributors, volunteers, organizers
and staff while gathering priceless documentation of their communal
heritage. The Mass. Memories Road Show (http://www.
MassMemories.net) is a public scanning project based at the
University of Massachusetts Boston which partners with local
communities to digitize family photographs and stories at public
events with the goal of creating a digital portrait of all the 351 cities
and towns in the Commonwealth. This article describes how the
project works to ensure broad participation in the planning and
execution of the project, as well as a detailed description of the
logistics of a Road Show event, which could be replicated in other
communities.
Keywords: Archives, Community history, Community, Digital
collection, Digitization, Family, History, Libraries, Local history,
Massachusetts, Memories, Multi-cultural, Partnership, Photos, Place-
based, Public history, Public, Scanning, State studies.
223
224 The Mass. Memories Road Show
Introduction
The Mass. Memories Road Show (MMRS) is an ongoing, on-the-spot
public scanning project in the Commonwealth of Massachusetts,
sponsored by the Joseph P. Healey Library at the University of
Massachusetts Boston and Mass Humanities, the state humanities
council. The MMRS documents Massachusetts people, places and
events through the contributions of individuals who bring their photos
and stories to be digitized at public events throughout the state. Over
the next few years, the project will partner with hundreds of local
organizations to visit each of the 351 communities in Massachusetts,
gradually building up a self-portrait of the Commonwealth through
the contributions of its residents. The MMRS is online at
http://www.MassMemories.net.
Project Background
The MMRS grew out of the place-based education initiatives of the
University of Massachusetts Boston's "Massachusetts Studies Project"
(MSP), which provides resources for Massachusetts teachers and
students in the areas of local history, culture and environmental
studies. A series of casual brainstorming sessions with librarians, MSP
board members, teachers and local historians coalesced in a vision of
a public history project inspired by elements of PBS's Antiques
Roadshow (people bringing their personal treasures to a local event
for professional perusal) and the Library of Congress' American
Memory Project (a library organizing digitized images from a common
heritage to be shared on the World Wide Web, see The Library of
Congress, n.d.) The subsequent development of the MMRS project
was guided by the work of Daniel Cohen and Roy Rosenzweig (2005),
Stuart Lee and Kate Lindsay (2009) and by the writings of Robert
Putnam and others about the nature of civic engagement. We have
borrowed elements from a number of superb online digital history
projects, among them the Maine Memory Network (Maine Historical
Society, 2009), for its richly productive collaborations with partnering
organizations across the state, The Organic City (The Organic City,
n.d.) for its place-based communal storytelling approach, the Coney
The Mass. Memories Road Show 225
Island History Project (Coney Island History Project, n.d.) for its
effective application of social web tools, the Charlestown Digital Story
Project (UMBC Digital Story, n.d.) at UMBC for its engaging
multimedia oral histories created through student-elder
collaborations, the Worthington Memory Project (Worthington Public
Library, 2002) for its transparent application of best practices in
indexing, and Orlando Memory (Orlando Memory, n.d.) for its
solicitation of public contributions to a community history project.
The MMRS project was originally designed to meet two goals:
collecting digital surrogates and personal annotations of locally held
primary sources that document people, places and events in
Massachusetts; and developing a searchable online repository of
sources that could be used for educational purposes at all levels. As
the project developed, we discovered that it met another important
need that has been incorporated as a key goal of the project:
community building. Road Shows have turned out to be deeply
engaging community events that connect people within the
community to each other and to others throughout the state, and have
proven to be meaningful in lasting ways to the people who contribute
and to those who volunteer. At its best, the MMRS seems to generate
both of the types of productive social relationships that Robert
Putnam describes in Bowling Alone: the Collapse and Revival of
American Community (Putnam, 2000); that is to say, bonding social
capital (holding together people who see themselves as being similar
in social identity) and bridging social capital (bringing together people
who consider themselves to be "unalike" in some aspect of social
identity.) In the words of a Road Show volunteer and contributor:
"[The Road Show] brings the community together... [It]
acknowledges to people that they are a part of the history-
making... It brings people to understand and respect different
cultures in their town through the old and new pictures... I think
it is one of the great community programs to encourage the
whole family to be involved." (MMRS Letters of Support, 2008).
Since its launch in the fall of 2004, the MMRS project has
organized Road Show events across the state, in the process digitizing
226 The Mass. Memories Road Show
thousands of photos and stories, collaborating with dozens of
community organizations and generating the practical lessons we will
share in this essay.
Guiding Principles
The following key principles have guided the development of the
MMRS since its inception:
First, we believe that everyone who lives or lived, works or
worked in a given community is equally representative of that
community. Thus, the Road Show's on-the-ground events and the
resulting digital collection are intended to reflect as realistic a picture
as possible of the community, based on demographics past and
present. For example, for a community that was predominantly Irish
in the early 20th century and is now largely Vietnamese, the event
attendees and the photos and stories in the digital collection should
reflect both of those cultures.
A second guiding principle of the MMRS project is the idea that
we are not creating a comprehensive archive of Massachusetts' (or a
given community's) history; rather, we are creating a self-portrait of
the state based on the photographic artifacts that are held in
individuals' personal photo albums and scrapbooks. Thus, the photos
that individuals bring to the event reflect their choices of how they
want to represent themselves and their families in the project archive,
as opposed to meeting a predetermined collecting standard. This leads
naturally to the project protocol that contributors caption and
describe their photos in the first person, rather than in impersonal
archival language. For instance, one Road Show image's caption and
description read:
"The Reading Theater Circa 1924: This is the second theater in
Reading built by my great uncle, Ed Turnbull. He also built the
first in 1913. My father, Arthur Kelley, was a popcorn boy. We
moved to Reading because my father loved it after spending so
much time here." (Mass. Memories Road Show, 2010; Image ID
41.106.1).
The Mass. Memories Road Show 227
Another guiding principal is that the Road Show events
themselves play a vital role in creating an understanding of communal
history. Particularly in large, diverse cities, individuals and
organizations may not be aware of the larger historical and
contemporary context of their community. Yet all of the people
arriving at their local Road Show with photos in hand have in
common their connection to that physical location and its history,
regardless of their age, ethnic, economic or religious background; all
"belong" simply by virtue of their involvement (present or past) in this
place. At the Road Show events, each person's connection to the
community is formally acknowledged through their presence there.
The events help build connections between contributors and
contributing organizations and the digital collection serves to mirror a
community back to itself. This principle of inclusion informs all
planning processes and influenced our early decision to arrange the
public events so that all participants could observe and share in the
contributions of others. For example, we project all photos on a wall
screen as they are being scanned, and arrange seating so that
contributors can listen to others' stories as they are being recorded.
A final, later-adopted guiding principal of the MMRS is that the
contributor is as worthy of documentation as the contribution. We
hope that our digital collection will serve as a resource of primary
source material well into the future. As such, we came to understand
that the participation of Road Show contributors and volunteers was
also worthy of documentation. Each person who takes part in a Road
Show has shown a commitment to their community history that we
realized should be captured and preserved. Thus, we include
"Keepsake Photos" (described below) and staff / volunteer group
photos in the dataset for each Road Show.
These principles formed the foundation for the development of
the MMRS project and are, we feel, responsible for the extraordinary
atmosphere of cooperation and community that participants have
mentioned in describing the Road Shows:
"The Road Show connected people of all ages to the past,
resulting in a deeper understanding of the present. [The] Road
228 The Mass. Memories Road Show
Show modeled for youth how to work together to share the
history of everyone's history. Through involvement in the Road
Show, a torch was passed to [our] youth, the torch of stewardship
of their community's history." - a Road Show
organizer/ contributor (Letters of Support, 2008).
"Many participants stayed and shared their photographs with
those sitting next to them, watched others being videotaped, or
watched the images being projected onto the overhead screen.
The interactive nature of the event allowed participants to share
stories and learn about other communities in their city." - a Road
Show organizer (MMRS Post-Event Participant Survey, 2009).
The Massachusetts Model
From the beginning, we aimed to create a model project that could be
replicated beyond Massachusetts. In the second year of the project, we
consulted with archival, legal and information technology specialists
to ensure that the database structure and metadata collection forms
met accepted standards and best practices for digital history projects
and that the online resources would available for educational use.
These specialists' recommendations were compiled in The Mass.
Memories Road Show Handbook: Procedures and Protocols for a
Public Scanning Project (University of Massachusetts Boston, 2006),
available on the project website. They guided us in creating an open-
source database system that stores the information from the
Contributor and Photo forms in fields that are compatible with the
Dublin Core metadata set, and in ensuring that the images collected
may be made available online on a non-exclusive basis for non-
commercial uses.
Road Show Logistics
Funding
The early years of the MMRS project's development were partially
supported by grants from the Massachusetts Historical Records
Advisory Board, the Mass. Foundation for the Humanities, and a
The Mass. Memories Road Show 229
Professional Development grant program at the University of
Massachusetts Boston. These grants, totaling approximately $30,000
over three years, covered some initial equipment, student work, and
expenses for pilot Road Shows. The majority of the project's support
in its early years was in the form of extensive contributions of time
and equipment from the project director, volunteers and community
collaborators. Thanks to the track record built up through the early
Road Shows, the project reached a turning point in 2006, when it was
brought under the umbrella of the University of Massachusetts
Boston's Joseph P. Healey Library as a community outreach initiative,
a step which allowed for the hiring of a public historian as a project
manager (.5 FTE) to focus on growth, stabilization and outreach.
In 2008, the state humanities council, Mass Humanities joined
with UMass Boston's Healey Library to co-sponsor a "mini-grant"
program to expand the Road Show. The program accepts applications
from interested communities who are required to assemble a local
planning team and outline how they plan to achieve a realistic
community portrait through their outreach and marketing. If accepted
for funding, the planning team handles local outreach and provides
volunteer staff, translators, non-technical equipment and
refreshments on the day of the event, while the grant funds
professional videography, stipends for event staff and post-production
of images, data and videos. The Healey Library provides technical
equipment and in-kind staffing for preparation, volunteer training,
website application support, consultation with the University
Archivist, running the Road Show event and follow-up.
Planning
Planning for a Road Show begins as much as a year in advance of
the event. Because our goal is to make these events and the materials
collected a realistic self-portrait of the community, we involve as many
local organizations as possible in the planning process: public
libraries, historical societies, public schools, cultural and ethnic
organizations, youth groups, historic preservation organizations,
business people, government officials, genealogists, senior citizens
and others.
230 The Mass. Memories Road Show
Ensuring participation by those who are sometimes under-
represented in local history projects (young people, non-English-
speakers, new arrivals and other minority members of the
community) is most dependably accomplished when the local
planning team is comprised of persons representing the same range of
backgrounds as the community at large. While this representation is
critical for the Road Show's success, it is equally important that local
planning team members share a genuine interest in cultural heritage,
archives, genealogy, family history, community history and
community building. Too often, "diversity" in planning committees is
arrived at by requesting the participation of municipal or
organizational representatives who are routinely tapped to represent
their communities in all sorts of projects, regardless of their personal
interests. We have learned that putting extensive effort into finding
true kindred spirits across a community's various subgroups - those
whose eyes light up at the mention of "old pictures" or "family stories"
or "local history" - can pay huge dividends in the overall ease of
encouraging broad participation in the eventual Road Show.
The local planning team is responsible for outreach to the
community to recruit contributors to the event. We have found that
the best-attended Road Shows were invariably the result of person-to-
person outreach by the local organizing team which, while time-
consuming, was much more effective than general publicity or
postings around town.
One community member described her participation in a local
planning team as valuable both personally and professionally:
"The Mass. Memories Road Show is a vehicle not only to
showcase communities but helps to bring them together. I was
able to observe how members of various and diverse institutions
came together to put on a community event. I met leaders of the
Asian community, historic/ cultural institutions and veteran
neighborhood organizations" - MMRS local organizing
committee member (MMRS Letters of Support, 2008).
The Mass. Memories Road Show 231
Staffing
The permanent MMRS staff consists of a part-time project
manager who oversees all aspects of the project, and the MSP director
who serves in an administrative and advisory capacity. Both positions
are located within the Digital Library Services Department of the
Healey Library at the University of Massachusetts Boston.
In addition to the permanent MMRS staff, each Road Show event
is staffed by 15-20 people: a combination of experienced "Roadies"
(people who previously volunteered at a Road Show in their own
community) and local volunteers. Roadies receive a small stipend (ca.
$50) for each subsequent Road Show they work; in other words, staff
are not paid to work in their own community but are compensated to
bring their experience and expertise to another Road Show in the
state. Local volunteers are recruited by the local planning team and
need not have any professional experience in archives, local history or
digital projects. We have been pleased to have skilled amateur
photographers, genealogists, librarians, teachers, historians and
computer buffs among our volunteers and Roadies.
One of the many unexpected outcomes of this project has been
the camaraderie built up among Roadies across Road Shows, and the
cross-fertilization that occurs through the activities of people from
different communities traveling to help others document their
community heritage. Some Roadies have been with the project for
several years, and the number grows with each event, since all
contributors and volunteers are routinely invited to help staff
subsequent Road Shows.
Staffing an event is apparently equally rewarding for the Roadies,
who often speak of feeling privileged to bear witness to the stories
shared by event contributors:
"Once the event began, I was touched in many ways. Some
stories brought a tear to my eye, like one woman's description of
how her brother was a prisoner of war for six months in World
War II, and the hurt and pain she felt, but the comfort his
picture brought her... Working on the project is by far the most
rewarding venture I have been involved in so far." - MMRS
232 The Mass. Memories Road Show
volunteer, Roadie and videographer (MMRS Letters of Support,
2008).
Training
All of the Road Show functions aside from videography - i.e.
photo scanning, metadata collection, still photography and customer
service - are performed by amateur Roadies and volunteers who are
trained just prior to the event. The MMRS staff provides written
instructions and conducts the training for all Roadies and volunteers.
In the past, volunteer training had been scheduled two days
before the Road Show, which allowed time to run the training along
with an interim day to fix any problems that might be identified
during the training session. However, it proved difficult to assemble
all of the volunteers in advance of the event, so subsequent Road
Shows have incorporated an hour-long training immediately before
the event, when all volunteers for the day are present. While this
worked reasonably well, we are developing a series of self-paced
multimedia training materials for all volunteers to work through well
in advance of the event. This will improve the efficacy of the training
process and will also bolster the confidence of inexperienced local
volunteers as the day of the Road Show approaches.
AT A ROAD SHOW
Each Road Show is held in a public space, most often a local public
library, historical society or community center. Most Road Shows are
scheduled to last about four hours, with an additional two hours in
advance for setup and an hour for breakdown and cleanup. Each Road
Show consists of "stations", which are tables devoted to specific Road
Show functions. Road Show participants move from one station to
another, providing information, having their photos digitized and
their stories recorded.
Contributors are invited to drop in at any time during the event.
We have found it works best to have a loose and casual atmosphere. If
a contributor arrives under great time pressure, we squire them
quickly through the process. But for the most part, people arrive at the
The Mass. Memories Road Show 233
event, go through the process, and then linger to listen and watch as
others go through the process. Even when the crowds have been
heavy, there has never been any impatience but only excitement and
goodwill. Said one contributor: "7 could have spent two hours with all
my photos and stories. I really had a good time" (MMRS Post-Event
Participant Survey, 2009).
Below are details about each of the seven Road Show event
stations, offered in hopes that those who choose to undertake a similar
endeavor may benefit from lessons learned in this project's early
years.
Welcome Station
The Welcome Station is the first stop for all contributors to the
Road Show. Locating the Welcome Station in the corridor or
anteroom outside the event space helps manage crowds and can also
serve as a visual invitation to the event for passersby. The station is
staffed by two "Greeters", generally one experienced Roadie and one
local community member. As contributors approach the table, the
Greeters welcome them to the event, briefly explain the station layout
and give them a numbered nametag and a Registration Form to
complete. The number on the contributor's nametag and form is used
to match each contributor to his/her photos, stories and metadata.
Greeters accept and review the Registration Form, and then invite the
contributors to proceed to the Info Station to complete paperwork on
their photos. If any of the contributors need assistance with
translation, mobility or other challenges, then a floating volunteer is
summoned to assist them as they proceed from station to station.
Equipment: 4-6' table and chairs; pre-numbered stick-on
nametags; registration forms; clipboards; pens (all forms must be
signed in pen.)
Info Station
At the Info Station, contributors complete paperwork on each of
the photos they are contributing to the MMRS digital collection. The
Info Station is staffed by three to five "Reviewers." Their role is to
assist contributors in selecting a few photos that represent themselves,
234 The Mass. Memories Road Show
their family and/or their community, and assist them in completing a
Photo Form for each photo. Reviewers encourage contributors to
select photos that best fit one or more of the following categories: an
original photo (rather than a printout or photocopy), a photo where
they know some identifying details (where taken, when, people
photographed, etc.), a photo that has particular meaning to the
contributor, a photo that is unusual in some way, or a photo that has
people in it (rather than a scenic shot).
Contributors are encouraged to complete the Photo Form as best
they can, using a first-person approach in providing the "who, what,
when, where" of the photo. The Reviewer then reviews the Photo
Form and may prompt the contributor to clarify or add more
information (last names, additional details, etc.) Reviewers can add
more information derived from conversation on the back of the form,
if necessary. Once each Photo Form is complete, the Reviewer assigns
a sequential number to each of the contributor's photos, writing that
number on the relevant Photo Form. Finally, the Reviewer invites the
contributor to take their photos and Photo Forms to the Scanning
Station, accompanying the person there if necessary. If there is a
waiting line at the Scanning Station, the contributor may visit any of
the other stations in any order.
Notes: In our experience, reviewers need to actively encourage
contributors to use first person ("I, Me, My, Our...") in photo captions
and descriptions. Frequently contributors will bring in an entire
album of photographs and it can take some time to help them sift
through them to select two or three to share with the project.
Contributors frequently do not fill out the Photo Forms completely at
first; it often takes additional coaching or asking questions to
encourage the contributors to share more detailed information about
their photos.
Equipment: Two 6-8' tables 10-12 chairs; Photo Forms;
Clipboards; Pens
Scanning Station
At the Scanning Station, the contributor's photographs are
digitized on a flatbed scanner. This station is staffed by two
The Mass. Memories Road Show 235
volunteers/ Roadies: a "Scanner" and a "Reviewer." The Reviewer
collects the Photo Forms and performs another quality control check
for completeness, legibility and correctness of the ID number. The
Scanner scans each photo at 300 DPI and saves as a TIF file named
using the format RoadShow#.Nametag#.Photo#.tif. This filename is
also recorded on the Photo Form. The scanning process is projected
live on a wall screen for all event visitors to see, which enhances the
spirit and enjoyment of the event. After the photos are scanned, they
are immediately returned to the contributor, while all of the
paperwork is collected by the Reviewer and stored in a large envelope
at each station.
If the photo is too big or unwieldy to fit on the scanner, or if, as
sometimes happens, the contributor brings a three-dimensional object
(we have seen aprons, clocks, weavings, jewelry and more) those
contributors are sent to the Keepsake Photo Station where the digital
camera is used to photograph the oversized object.
Notes: Each Scanning Station can handle between 70 and 100
photos within the event time. For most Road Shows to date, this has
meant setting up two Scanning Stations.
Projecting the scanning process onto a wall screen not only helps
to build community, but also is an engaging way for those waiting
their turn at one of the stations to pass the time. There have been a
number of exciting moments in Road Show where someone recognizes
a person or place that flashes on the screen and adds their story to the
mix.
Equipment: laptop computer; scanner (capable of scanning
letter-sized documents and photographs as tiffs at 300 DPI) and
necessary drivers to connect scanner to laptop; LCD projector;
portable projection screen; paperclips; blank CDs for backup; two 6-8'
tables and chairs; large envelopes to hold collected photo and
contributor forms and backup CD.
Keepsake Photo Station
The Keepsake Photo Station is staffed by a "Photographer" and a
"Paperwork Manager." The Photographer snaps a photo of the
236 The Mass. Memories Road Show
contributor holding one (or more) of the contributed photos, from a
distance of about four feet, and framing from the waist up. The
Photographer then prints out the photo as a souvenir for the
contributor to take home with them, while the digital version is saved
for later incorporation into the online database. The Paperwork
Manager is responsible for completing the Keepsake Photo Log with
the name, nametag number and other relevant information which is
transferred to the image file of the keepsake photo after the event.
Notes: The Keepsake Photo Station also serves as the station for
photographing images and objects that are too large to fit on the
scanner. Because the printing process can be slow, contributors are
invited to visit other stations while their keepsake photo is being
generated. Printed photos are posted onto a bulletin board for later
pickup.
Equipment: digital still camera; tripod; keepsake photo log; rich
colored backdrop; photo printer with paper and ink (capable of
printing 4x6" prints from digital camera onsite); bulletin board and
thumbtacks; table for printer; chairs
Video Station
At the Video Station, contributors are asked to share a 3-5 minute
story about the photo(s) that they contributed to the project. At the
Video Station there is a Videographer and an Interviewer, although
these can be the same person, depending on the experience of the
staff. The Interviewer begins by asking the contributor to state their
name and nametag number on camera for data quality control
purposes, and then asks the contributor to share a three to five minute
story about one or more of the photos that they brought with them.
Clarifying questions may follow, in a casual, conversational style.
Notes: While trained amateurs can achieve high quality scanning
results, videography is still best left to professionals, or at least those
with extensive experience in videotaping in public settings under time
and quality pressures. Public access television stations in each
community have provided experienced staff for this important
function.
The Mass. Memories Road Show 237
Equipment: digital video camera; appropriate lighting, lavaliere
microphone and other equipment for recording interviews;
photogenic chair with armrests for interviewees to sit in; backdrop,
preferably one reflective of the community.
Preservation Station
The Preservation Station includes a display and handouts on how
to care for family photos and documents, both contributed by
volunteers from the New England Archivists. Contributors can stop by
this table to peruse the displays, pick up information and ask
questions of the archivists. Road Show volunteers direct contributors
to the Preservation Station if they notice that a contributor's artifacts
are delicate or in poor condition, or in response to questions from
contributors about preservation, valuation or potential repositories for
their artifacts.
Notes: Despite the Preservations Station's alluring display of
"Archival Evils" showing the dramatic effects that mold, moisture,
heat, glue, etc can have on precious family artifacts, contributors have
tended to pass by this station, perhaps because it is not directly
related to the digitization processes. To mitigate this, we now place
the Preservation Station closer to the scanning action, and encourage
the archivists to wander the room and offer advice as moved to do so
by what they see among the contributions.
Equipment: 6-8' table and chairs; display materials and
handouts about photo preservation, preferably supplied by
professional archivists
Local History Station
The Local History Station provides consultation and resources on
local history. This station is usually staffed by representatives from the
local historical society or reference librarians who are broadly
knowledgeable about the history of the area and its residents over
time. The Local History Station includes a display on local history and
local history resources (books, pamphlets, maps, etc.) Local
historians/librarians also keep an eye out for any photos that they
may want to solicit as a donation to their archives.
238 The Mass. Memories Road Show
Notes: The volunteers at the Local History Station are able to
answer local questions, and identify persons, places and dates in
contributors' photos. These are also the people who will be carrying on
aspects of the project into the future. Usually the local groups who
staff this station come away with new member signups to their
organizations.
Equipment: 6-8' table and chairs; brief written history of the
town in pamphlet form; other relevant local history information
(books, maps, etc.)
Post-Event Processing
Based on our own experience and the recommendations of
professional consultants, we developed a routine for managing the
wealth of information gathered at each Road Show. At the end of each
event, we back up the digital images onto a portable hard-drive and
make photocopies of all the paperwork. Within a few weeks of each
event, MSP staff members perform the data-entry and image
processing necessary to include the images and metadata in the online
database. Once data entry is complete, we send digital copies of all
materials to each partnering organization for that event. We also send
out a brief survey to participants, send contributor contact
information to all partners, and finally, notify all participants when
the photos and videos have been mounted on the project website at
www.MassMemories.net.
351 and Beyond
In its first five years of operation, the MMRS has visited only a
fraction of the 351 cities and towns across Massachusetts and has
collected just a portion of the millions of photos and stories that still
lie in the basements, attics, scrapbooks and photo albums of its
residents. We have a long way to go and, like many other projects,
continue to seek ways to reach a broader audience, ensure long-term
sustainability, and reduce operational costs. To that end, we are in the
process of converting the project's current database and website into a
web-based data-management system to streamline data- and image-
processing procedures.
The Mass. Memories Road Show 239
At Road Shows we have seen high school students and retirees
share stories about how their common childhood neighborhood has
changed over time. We have heard former residents from opposite
sides of what had been a racially-segregated housing development
recall overlapping events from their childhoods. Parents have come to
Road Shows with their children to share photos of their immigrant
ancestors. After the events, we see communities continuing the
connections and conversations that were started at Road Show events
through local history exhibits, extended documentaries about local
residents, and new partnerships among local organizations. The
ultimate lesson learned? Running a state-wide digital history project
on a shoestring budget and staffed primarily by volunteers is not only
possible, but brings immeasurable rewards for the contributors,
volunteers, organizers and staff while gathering priceless
documentation of their communal heritage.
In closing, here are words from some of those who have
participated in the Mass. Memories Road Show to date:
"The students said they learned so much from the families and
individuals who showed up to share photos and tell stories of the
old days... Students began to realize that they will someday be the
elders, the caretakers of their city's, their family's, their culture's
history." - Middle school teacher whose students participated in a
Road Show as contributors and volunteers
"I met many new people and heard many interesting stories
about the history of the community in which I live that I would
have otherwise never heard. I have left the experience of my
involvement with the Mass. Memories Road Show with even
more interest in my community and the tapestry of people that
populate it." - Road Show volunteer/contributor.
"I have only worked in three Road Shows and can't wait for
more!" - 8th grade student who worked as a Roadie during his
summer break.
"We heard the story of the lamplighter in North Quincy, learned
that a Quincy resident was Jimmy Dolittle's wingman, and that
people still remember the smell of fresh bread at the Sumner
240 The Mass. Memories Road Show
Bakery. These footnotes don't appear in standard histories, but
now they are on record for all to share." - Local history librarian
and Road Show organizer.
"Having spent my entire career in the field of public history, I
cannot think of a project that contributes more to a broader
public reflection and appreciation of the past. The Road Show has
developed a creative, innovative and, perhaps most importantly,
an effective approach to engaging a community in documenting
its own history." - State humanities council representative and
Road Show attendee
The authors would like to thank the following supporters who
helped shape the Mass. Memories Road Show in its earliest days: Ron
Adams, Phil Byrnes, Celeste Finison, George Hart, April Hagins
Johnson, Elizabeth Clancy Lerner, Cagen Luse, Marisa Luce, Mary
McCarthy, Paul McCarthy, Elisabeth McGregor, Hoa Mai Nguyen,
Daniel Ortiz-Zapata, Frances Pollitt, Frank Poon, Barbara (Bobby)
Robinson, Ellen Rothman, Sequoia Stenlund, Emily Sweeney, Earl
Taylor, Rebecca Withers.
References:
Cohen, Daniel J. and Roy Rosenzweig. (2005), Digital history: A
guide to gathering, preserving and presenting the past on the
web. University of Pennsylvania Press. Retrieved March 23, 2010
from http:/ / chnm.gmu.edu/ digitalhistory.
Coney Island History Project, (n.d.). Retrieved March 23, 2010 from
http://www.coneyislandhistory.org/
Lee, Stuart D. and Kate Lindsay. (2009). If you build It, they will
scan: Oxford University's exploration of community collections. In
EDUCAUSE Quarterly, vol 32, 2009. Retrieved March 23, 2010
from http://www.educause.edu/EDUCAUSE+Quarterly/
EDUCAUSEQuarterlyMagazineVolum/IfYouBuildltTheyWillScan
Oxford/i74547#TB_inline?height=400&width=630&inlineId=side
bari&modal=false
The Mass. Memories Road Show 241
Maine Historical Society. (2009). The Maine Memory Network, a
project of the Maine Historical Society. Retrieved March 23, 2010
from http://www.MaineMemory.net.
Mass. Memories Road Show. (2010). Mass. Memories Road Show -
Your family's place in Massachusetts History. Retrieved March 23,
2010 from http://www.massmemories.net.
MMRS Letters of Support. (2008). Letters of support submitted for
MMRS nomination for NCPH Public History Award. Unpublished
documents. (January, 2008).
MMRS Post-Event Participant Survey. (2009). Unpublished data.
Orlando Memory (n.d.). Homepage. Retrieved March 23, 2010 from
http://dc.ocls.info/.
Putnam, Robert D. (2000). Bowling alone: the collapse and revival of
American community. Retrieved March 23, 2010 from
http://www.bowlingalone.com/
The Library of Congress, (n.d.). The Library of Congress: American
memory. Retrieved March 23, 2010 from http://memory.loc.gov/
ammem/index.html.
The Organic City. (n.d.). Homepage. Retrieved March 23, 2010 from
http : / / www. theorganiccity. com/ wor dpr ess / .
UMBC Digital Story, (n.d.). Digital stories from Charlestown.
Retrieved March 23, 2010 from http://www.umbc.edu/oit/
newmedia/ studio/digitalstories/ctds.php
University of Massachusetts Boston. (2006). The Mass. Memories
Roadshow handbook: Procedures and protocols for a public
scanning project. Retrieved March 23, 2010 from http://www.
msp.umb.edu/MassMemories/handbook/MMRSHandbook.pdf.
Worthington Public Library. (2002). Worthington memory.
http : / / www. worthingtonmemory. org/index. cfm.
Picturing the Museum: Education and
Exhibition at the American Museum of
Natural History
Kelli Anderson, Barbara Mathe, Eric Muzzy, Stacy Schiff
(American Museum of Natural History Research Library)
Abstract
The Research Library of the American Museum of Natural History
received funding from the Metropolitan New York Library Council in
2007 to produce a web exhibit of 989 historic images. Picturing the
Museum: Education and Exhibition at the American Museum of
Natural History served as the prototype for a comprehensive
database for the Research Library's extensive Photographic Collection.
In the larger context of the history, development, and use of the
Library's Photographic Collection, this article describes the project's
conception through a self-published book produced for the Trustees'
Library Committee, the funded project and the ongoing development
of the larger database. One of the internal goals and results of the
Picturing the Museum project was to analyze, codify, and document
local practice, policies and workflow for more efficient delivery of
images to the web.
Keywords: Cultural metadata, Digital imaging, Digitization,
Exhibition, Libraries, Museum, Museum education, New York,
Photograph, Photographic collection.
242
Picturing the Museum 243
The American Museum of Natural History
Photographic Collection
Founded in 1869, the American Museum of Natural History (AMNH)
is one of the nation's preeminent institutions for scientific research
and public education. Throughout its history, the Museum has
pursued joint missions of science and education. The Museum's power
to interpret wide-ranging scientific discoveries and convey them
imaginatively has inspired generations of visitors to its grand
exhibition halls and educated its visitors about the natural world and
the vitality of human culture.
The Museum collections include over 32 million objects and
specimens relating to anthropology, zoology, paleontology and the
physical sciences. From the collected stories, clothing and material
culture of the peoples of the Pacific Northwest Coast to the dinosaur
eggs unearthed in Mongolia, from meteorites brought back from
Greenland on a wooden sailing ship by Robert Peary to one of the
most comprehensive sets of fossil horses ever assembled, the breadth
and variety of the collections is astonishing.
Illustrating the work of the Museum scientists and staff are over
1.5 million black and white negatives, color transparencies, lantern
slides and photographic prints held in the AMNH Research Library.
The majority of these photographs were taken by Museum explorers
and scientists who documented their field work. At the same time,
they photographed the local environment and the people who lived
there. The physical growth of the Museum was recorded by staff
photographers who also photographed the work of the exhibition and
education departments. These images are requested by researchers,
students, educators, and professionals from around the world for
academic and general publication, educational broadcast and
distribution, artists' reference and personal use.
The dissemination of this extraordinary Photographic Collection
can be traced directly to Museum founder Albert Bickmore, one of the
earliest and most enthusiastic advocates of visual education.
Bickmore's lantern slide lectures were so successful that a new and
larger theater was built in 1900 to accommodate the lines of teachers
244 Picturing the Museum
awaiting admission. To expand the Museum's educational mission
beyond its walls, Bickmore created a lantern slide lending library of
over 140,000 slides. The slide library formed the basis of the Natural
Science Study Collections that were delivered to schools throughout
New York State, (see Figure AMNH-i)
Figure AMNH-1: Image 37244, Teachers selecting lantern slides; Image 313944,
School Delivery, American Museum of Natural History Photographic Collection.
Picturing the Museum 245
In addition to the slides, the Museum delivered specimens and,
later, model dioramas accompanied by lectures prepared by the
Museum's educational and scientific staff. Bickmore's initiative
foreshadowed the digital distribution of images by over a century. It is
now possible to create worldwide access to the Collection and the
public has come to expect to be able to find images online. Organizing,
describing and digitizing a photographic collection of this size,
however, is an enormous undertaking.
Picturing the Museum
To develop internal support within the institution for digitizing the
Photographic Collection, the Library staff selected 50 appealing
images related to the Museum's history and designed a small print-
on-demand photo book to distribute at the January 2007 meeting of
the AMNH Trustees' Library Committee (see Figure AMNH-2).
Figure AMNH-2. Front and back covers of Picturing the Museum (© 2007), American
Museum of Natural History, produced using Blurb.com (www.blurb.com).
At the same time that the book was being developed, the Library
used a similar theme, Natural Science Education and Exhibition at
the American Museum of Natural History, as a digitization project
246 Picturing the Museum
proposal for METRO'S Regional Bibliographic Databases and
Interlibrary Resources Sharing Program. This choice was made
because the parameters of the grant proposal indicated that the
subject matter should relate to the New York City Metropolitan Area.
While most of the AMNH photographs were taken in distant places on
every continent, the images of the Museum and its work were made in
New York City and the surrounding area.
The METRO funded project, Picturing the Museum: Education
and Exhibition at the American Museum of Natural History (URL:
http://images.library.amnh.org/photos/index.html) was created to
showcase nearly 1000 images from the Photographic Collection, and
to act as a prototype for developing a larger more comprehensive
image database that would grow over time. The photographs in
Picturing the Museum illustrate how the Museum staff presented
natural science to the New York City public and the reactions of that
public, particularly the school children. The pictures have been used
by scholars studying education, museology, visual representation and
local history. An NYU professor e-mailed the Library describing how
the web site generated a new research topic, an article entitled
"Exhibiting the Exhibit-Makers" about the history of the photographs
of exhibition preparation at the AMNH. She is presently researching
the Museum Archives to find more about the long-neglected Museum
staff photographers in order to define their role historically within the
institution. The pictures on the web site, documenting dinosaurs to
fashion history, also tell a story about the people of New York City as
the twentieth century emerged.
Building upon Albert Bickmore's prescient vision of distributing
images as an educational resource, the long term goal of the project is
to make more images in the Collection readily available to researchers,
scholars, students and the general public. Online access to the
Photographic Collection will expand Bickmore's initiative
exponentially, vastly increasing the numbers of those who will use the
images for research, publication, broadcast and personal use. Students
will be able to use these images in their class presentations and more
individuals will be able to find pictures of their family from previous
generations, like the Ainu woman from Japan, whose grandfather's
Picturing the Museum 247
photograph from the 1904 St. Louis Exposition is in the AMNH
Photographic Collection. Online access will also help to preserve the
original materials. Before scanning was institutionalized at the
Museum, negatives would be repeatedly pulled from the collection for
printing in the Museum's photographic studio. Now they are scanned
once and left undisturbed, preserved in a climate controlled
environment.
Perhaps the most exciting potential of digital access to the AMNH
Photographic Collection is that it offers the possibility of restoring
provenance to many of the images in the Collection. With negatives
filed numerically upon accession and prints arranged by subject to
facilitate teaching, many collections were disassociated from their
history. Study of historical photographic evidence is infinitely more
valuable when examining a series of photographs instead of a single
photograph out of context.
Sources for the Image Metadata
The basic records for the Picturing the Museum database were
derived from 23 typed logbooks that had been created when the
negatives were originally added to the Museum's collections. The
logbook data reflects but does not consistently duplicate the data on
the negative envelopes or on the photo print file cards that have been
used to provide access to the images (see Figure AMNH-3).
As part of a previous Mellon funded digital library project, the
logbooks were scanned and then triple-keyed (a process by which the
data was entered by hand three separate times and compared for
discrepancies), an outsourced task. The result was 186,000 raw legacy
records. Despite over a century of errors, anomalies, and
abbreviations, using the legacy data as a source and editing and
correcting that data is much faster than new cataloging. The verbatim
legacy data itself is maintained in a separate field in the record. While
all the data for Picturing the Museum came from the logbooks, it was
important to plan the database design and write cataloging rules to
consistently accommodate the larger Photographic Collection in all its
variety. Fields not necessary in Picturing the Museum and later added
248 Picturing the Museum
to capture additional data for other collections included English
translation, common name, scientific name, and cultural context.
Images from rare books, expedition photographs, and images of
artifacts would have this information associated with them.
Figure AMNH-3. Data sources: logbook, back and front of photo file card, and
negative sleeve, American Museum of Natural History Photographic Collection.
Staffing and Workflow
The METRO grant enabled the Museum to employ a part-time
cataloger and a scanning technician for the project. Already on staff,
the AMNH digital services librarian created a prototype web site for
the image repository using Ruby on Rails and MySQL and the digital
photographer created the web site's graphic design and styling. The
Museum archivist managed the project and staff - consisting of
volunteers, project employees, interns, and regular staff working on
the project. The acting director provided expert direction and
oversight. A library school intern with experience working for a stock
photo agency did the image selection by reviewing the photo file cards
and contributed greatly to the organization of the four themes chosen
for the site: Dioramas, Education, Exhibition, and Exhibition
Preparation. Workflow was planned so that scanning began
approximately four weeks before cataloging to maintain the
Picturing the Museum 249
arrangement of the materials as organized for the project. The team
met regularly to exchange materials and discuss the ongoing
development of the database.
Selection and Staging of Materials
The intern choosing the images reviewed the photographs on the
photo file cards and chose the most compelling. The negatives were
scanned and uploaded to match the logbook data which had been
imported into the database. Lastly, the cataloger edited the logbook
data using the additional information found on the photo file cards
and negative envelopes.
Retrieving the negatives and staging the physical materials for
Picturing the Museum required detailed planning and careful record-
keeping. Several individuals worked with the materials to select,
retrieve, scan, catalog, and re-file the images at different times.
Project staff members in teams of two retrieved the negatives from the
film storage room and grouped them by format to expedite scanning.
Digitizing, describing, organizing, storing, and building a database for
approximately 1000 images and their metadata was a large and
complicated undertaking with a sizable team and many detailed
procedures. The experience gained will be crucial for the ongoing
digitization of the larger Photographic Collection.
Equipment and Technical Specifications
The funding received from the METRO grant allowed for the purchase
of new equipment and software. For the project, the Library
purchased the Epson Perfection V750-M Pro flatbed scanner, a MAC
Pro work station, Eizo CG211: ColorEdge Color Calibration LCD
Monitor with hood, and the Eye One Calibration hardware. Later in
the project, due to an issue with moire patterns appearing on 26
images, an anti-newton ring glass was purchased to rescan the
problem images. Protocols on how to maintain a consistent color-
managed environment, in order to ensure the tonal fidelity of the
scanning and post-processing work were developed. An initial
monitor profile was created with the Color Navigator software and the
250 Picturing the Museum
Eye-One Calibration device and the monitor was measured against the
profile monthly to ensure that the calibration had not drifted.
Most of the negatives scanned for this project were glass plate.
The black and white negatives were captured in 16-bit grayscale TIFF
files at an archival size equivalent to those recommended in the
National Archives and Records Administration's manual, Technical
Guidelines for Digitizing Archival Materials for Electronic Access:
Creation of Production Master Files - Raster Images. All archival
images were scanned to leave film borders and glass plate edges
intact. The scans were corrected conservatively to make available as
much of the information from the original negatives as possible.
Manual dust and scratch removal was carried out when these
interfered with image viewing. After the tone and texture correction,
all images were converted to 8-bit grayscale and saved as TIFF files
with borders cropped for web presentation on the Museum's SAN. The
JPEG web derivatives are automatically uploaded and sent into
storage alongside the TIFFs.
The post-processing work procedures, based on the National
Archives and Records Administration's best practices, are
documented on the Picturing the Museum web site. Additionally, the
technical information section of the project web site includes a
detailed guide created by the Library's digital photographer,
http://images.library.amnh.org/photos/imageworkflow.html.
Conditions of Use
The American Museum of Natural History endorses a policy that
makes images in the Photographic Collection freely available to the
research community to the fullest extent and as soon as possible. At
the same time, it is recognized that access to some images must be
restricted to protect the privacy of individuals and to respect various
cultural traditions. An individual's image will be removed from the
site upon request from that individual. Non-commercial users may
download the files for their own use provided that they cite the
Museum as the source of the image. No one may alter or modify the
content without written permission from the Museum.
Picturing the Museum 251
Terms for commercial use of the images are detailed on the
Picturing the Museum web site. These terms were written and
subsequently approved by the Museum's General Counsel and
composed so that they would refer to the Museum's Photographic
Collection as a whole not just to the Picturing the Museum site.
Development of the Database System
The database for Picturing the Museum was designed to include
search and browse features and an efficient cataloging interface, and
to enable sharable metadata for interoperability. Metadata values and
collection tables for the Picturing the Museum database were loosely
modeled on the DSpace version 1.41 database schema. This design
offered an opportunity to create multiple collections with distinct
metadata schemas. Because the Library had different fields for
different collections, instead of creating separate tables, metadata
values were appended to a single table with a unique key to identify
the field names for each collection. This design offered streamlined
storage and a simple approach to dealing with repeating fields. The
Library used MySQL to manage the images and metadata because it is
a widely used database environment in the open source community,
and was, at the time, the primary production database for Ruby on
Rails web sites.
Preliminary data clean-up of the logbooks was conducted using
SOLR which is based on Lucene, a Java implementation of an
enterprise search and retrieval library (Welcome to Solr, 2010). The
Ruby-based SOLR framework was meant to be used as an advanced
search mechanism across all collections, but the time to implement
this feature became an issue. The web, however, is just one aspect of
the Picturing the Museum image repository; Ruby is the language
used to develop the computer workflow and image processing.
Since the Library had extensive existing descriptive metadata
comprising the bulk of a cataloged record, a scripted workflow was
created with the Ruby language to ingest this metadata into
corresponding collections, with a field indicating each specific logbook
for each image. Metadata for these collections was ingested into the
252 Picturing the Museum
database prior to images being uploaded and a separate workflow was
created in Ruby to upload images into the repository. During this
upload, images were associated with a collection and a TIFF image
and corresponding service master copy, a cropped original, and three
derivative thumbnails were processed. Embedded rights and technical
metadata using Harvard's open source Jhove were then extracted.
Finally, an external library was created in Ruby to manage the
retrieval and display of records in the web presentation environment.
Cataloging Procedures and Metadata
Documentation
Editing and producing the photo book helped to refine the scope of
the digital project and revealed some descriptive cataloging issues that
would have to be addressed in the creation of an image database. For
example, when producing the book we found that the original captions
from the logbooks often did not translate into a good descriptive title.
Images such as 22847, Mr. Albert with loan collections for school use,
had no logbook caption at all. Images such as 274856, now reads Two
boys collecting turtles, Trailside Museum, but was originally, Two
boys collectors bring ing in turtles, including the grammatical errors
and typos. Titles were composed to be as descriptive as possible using
key words to increase retrieval.
The descriptive data fields were formed according to DACS (SAA:
Describing Archives: A Content Standard, n.d.) and mapped to
Dublin Core (see DCMI Metadata Terms, 2008), unqualified, for
future interoperability through OAI web harvesting. The descriptive
fields for Picturing the Museum include the following:
• Image Number
• Medium (Source Format) (drop down based on the Thesaurus
of Graphic Materials)
• Size
• Original Caption
• Title
• Original Photographer (drop down)
• Copy Photographer (drop down)
Picturing the Museum 253
• Date
• Geographic location (based on the Thesaurus of Geographic
Names)
• Person
• Institution
• Permanent Hall
• Expedition
• Temporary Exhibition
Roughly based on the VRA Core Element Description (see
Welcome to VRA Core 4.0, n.d.), syntax rules for the title field were
crafted to be brief and consistent. Writing a new title but retaining the
original caption helped maintain integrity to the original data and the
history of the Collection, while allowing for a broader search. For
example, an original caption of "Corean hats" would be edited to
"Korean hats" in the title. A search by either spelling would yield the
record. Also, original captions reflecting social and political realities of
the time of the photography would be retained while a more accurate
or culturally sensitive title could be crafted. For example, the larger
database includes many images of artifacts from the Congo; Congo is
retained in the source data, while the geographic location field now
reads Congo (Democratic Republic), as it appears in The Getty
Thesaurus of Geographic Names (i.e., TGN, see Getty Thesaurus of
Geographic Names Online, n.d.,) which was used to create the
standard names of places in this field. Similarly, cultural names such
as "Indian" would be changed to "Native American", "natives" to
"people" or "people of ...." Researchers can therefore find information
by both historical and present-day terms retaining the original
documentation.
Considering the size of the AMNH Photographic Collection as a
whole, the records were designed for easy and fast data entry that
would comply with documented standards for interoperability.
The geographic location field wasn't as much of a challenge
during Picturing the Museum, but when the database was expanded to
include field photographs and images of artifacts, the metadata
became more complex. For example, a 19th century figure of Burmese
254 Picturing the Museum
jadeite carved in China required a repeated field for geographic data
(as well as an updated geographic name of Myanmar for Burma).
Because there are so many more place names in the AMNH
Photographic Collection than appear in the TGN, the cataloger uses
the broadest term in the string in the Geographic Location field and
references the specific site or place name in the title or note field.
However, the AMNH Library plans to contribute these place names,
such as Hopetown, British Columbia, to the TGN in the future as the
Photographic Collection contains images from many smaller villages
and localities worldwide.
Information that isn't suitable for any other field and that isn't
found in the TGN or the Library of Congress Authorities but is
necessary for describing the image is placed in the title or note field so
that all data is captured and will appear in search results.
Working on the book also highlighted the need for local
authorities for photographer's names, permanent halls, temporary
exhibitions, and Museum expeditions. The first developed was the
photographer list. Using SOLR, the programmer produced a unified
list of the many name variations in the 23 logbooks. Names were
validated in the catalog and files of the AMNH Research Library as
well as through the Library of Congress Name Authority Headings
then inserted into a drop-down box on the cataloger's interface.
For over a century of photographs, there were over 262
photographer names, including celebrated individuals such as Roy
Chapman Andrews and Edward S. Curtis. For many images, no
photographer was listed at all and "Unknown" was entered. Authority
lists are also required for temporary exhibits, permanent halls, and
expeditions. Not only are these terms unique to the American
Museum of Natural History, but they have changed over the years
prompting discussions about whether to use current Museum terms
or historic ones to reflect the Museum the way it is today, or the way it
was when the images were photographed.
Choosing metadata fields and creating a data dictionary that
reflects the content in all collections as well as in Picturing the
Museum requires planning and flexibility.
Picturing the Museum 255
Database Growth
Because the image database continues to grow, continuity in staffing
and procedures has proven valuable. A guide to cataloging procedures
is available on the web site (see PTM Cataloging Procedures, n.d.)
Expanded examples and data fields for other Collections are
documented internally. The cataloger recorded issues throughout this
entire process allowing the programmer to make improvements to the
interface to accommodate the metadata needs of different collections.
Documentation also helped to prepare for the training of student
interns and volunteers to catalog in the database.
The cataloging interface for the project evolved over time and
continues to change as additional image collections are digitized and
cataloged. For example, a 1928 hand-colored lantern slide from a
Museum expedition to Mongolia will require different metadata than
images selected for Picturing the Museum due to medium, the need
for additional local subjects, and a field for Museum expedition, and
there are many such items in the Collection.
Fields added to the database after Picturing the Museum include
the following:
• Artist (drop down)
• Publication
• Catalog/Specimen Number
• Object/ Specimen Location
• Department/Discipline (drop down)
• Cultural Context
• Common Name
• Scientific Name
• Local Subject (drop down)
Drop down lists are used for ease and consistency in fields like
original and copy photographer, artist, department/discipline and
local subject. This last term allows for a somewhat standardized
vocabulary that can be searched for often used subject terms. For
example without this data field searching for "birds," would not yield
an image titled "Cardinal." Images of dinosaurs often don't use that
256 Picturing the Museum
word in the title e.g. "skull of Tyrannosaurus Rex," would not be found
by searching "Dinosaur." This approach will yield a large if not
comprehensive data set for the general researcher. More precision will
result from geographical headings and fields with authority controlled
vocabularies. Drop down lists will be added when the authority work
is completed for Permanent halls, Temporary exhibits and Museum
expeditions.
Lessons Learned and the Future of the Database
The type of images in Picturing the Museum — historical documentary
photographs generally taken in the Museum and the New York City
area — are a subset of the AMNH Photographic Collection. Other
kinds of pictures coexist within that larger collection: field
photographs taken on Museum sponsored expeditions, images of
specimens and artifacts, and portraits of individuals. Authority work
is a current priority, particularly for names of Museum halls and
Museum expeditions. Lists are nearly completed for permanent halls
and temporary exhibits but creating authorities for Museum
expeditions is a larger and more complicated task and will require
some time.
Picturing the Museum is the only portion of the Image Database
available to the public at this time but the entire database is searched
frequently by the Library staff to fulfill research requests from both
inside and outside of the Museum. This AMNH Image Database
combines the logbook data, another smaller color database as well as
sets of data specific to collections that were not integrated into the
larger numbering scheme. The staff continues to make documentation
of procedures and data standards a priority, especially as the database
grows to include the metadata fields for other types of collections, e.g.
specimen or artifact numbers, artists' names and the names of culture
groups. Work is proceeding with the help of volunteers. The part time
image cataloger for the METRO project, now a permanent staff
member, is developing a training program for volunteers and interns
to edit the catalog records based on a similar plan using volunteers to
Picturing the Museum 257
scan and process images. The AMNH Library has close to 60,000
images scanned and over 200,000 largely unedited records.
Researchers request photographs on a routine basis and even
though most of the records are unedited the database has become an
indispensible time saving tool to search for an image and then
determine whether or not it has been scanned. Providing individual
images for use in books, articles, classroom projects and videos is a
mission related service of the Library. However, it is also important to
recognize that this is a photographic archive and for scholarly research
projects, it is essential to view the images together in their historical
context. Despite a century long tradition of item level description, one
of the major opportunities provided by the Image Database will be to
recollect the collections of images that have been dispersed in the
Museum collections over time and place them in their historical
context.
The Library has begun a prototype of a web template to be used as
an interface to contextualize collection specific information. By
identifying and grouping the images, e.g. by the name in the
expedition field, it will be possible to retrieve a preselected group of
images from that expedition and link to a select dataset as the
container list on the web site. The other text, e.g. site overview (scope
and content); biographical notes, historical notes, bibliographies,
other resources map can be easily encoded into an EAD (see Encoded
Archival Description Version 2002 Official Site, 2009) finding aid. As
more of these special photographic collection web sites are developed,
the Library will create a separate interface on its home page for direct
access to these collections. As part of that effort, it will also integrate
the means for viewers to comment on the images, members of
originating cultural communities, scholars, students and others will be
able to correct and add to the information currently associated with
the images, expanding the knowledge surrounding their history.
Mapping to EAD and at the same time making the collection level
records for the web sites harvestable by OAI, will also increase the
possibility of retrieval of these collections.
258 Picturing the Museum
In a world of technological change and seemingly limitless
possibilities for communication, the AMNH Library is expanding on
Bickmore's vision for sharing visual resources and knowledge for the
people for science, and for education, now and in the future.
References
PTM Cataloging Procedure, (n.d.). Retrieved May 1, 2010 from
http://images.library.amnh.org/photos/PTM_CatalogingProcedur
es.pdf
DCMI Metadata Terms. (2008). Retrieved May 1, 2010 from
http : / / dublincore. org/ documents / dcmi -terms /
Encoded Archival Description Version 2002 official site. (2009).
Retrieved May 1, 2010 from http://www.loc.gov/ead/
Getty Thesaurus of Geographic Names Online, (n.d.). Retrieved May
1, 2010 from http://www.getty.edu/research/
conducting_research/vocabularies/tgn/
SAA: Describing Archives: A Content Standard (DACS) (n.d.).
Retrieved on May 1, 2010 from http://www.archivists.org/
governance/ standards/dacs.asp
Welcome to Solr. (2010). Retrieved March 30, 2010, from
http://lucene.apache.org/solr/
Welcome to VRA Core 4.0. (n.d.).
http : / / www. vraweb . org/pr oj ects/ vracor e4 /
Part III - The Digital
Campus: Digitization in
Universities and Their
Libraries
259
Developing an Institutional Repository at
Southern New Hampshire University: Year
One
Alice Piatt (Southern New Hampshire University)
Abstract
In 2008, Southern New Hampshire University was awarded a three-
year, $500,000 national leadership grant from the Institute of
Museum and Library Services to create a digital repository using
DSpace open source software. Events from the first year of the
repository's development are presented and discussed. Key elements
addressed include the challenges involved with customizing the
DSpace infrastructure, creating standards for access and master files,
implementing metadata standards, and developing digital
preservation policies. The value of cross-departmental participation is
shown, and the importance of planning for digital preservation is
presented.
Keywords: Best practices, Digital library, ETD, Electronic,
Interfaces, Institutional repository, Open source, Scans.
Introduction
In 2008, Southern New Hampshire University was awarded a three-
year, $500,000 national leadership grant from the Institute of
Museum and Library Services (IMLS) to create a digital repository
using DSpace open source software. The inspiration for the project
was a collection of student theses and dissertations from the School of
261
262 Developing an Institutional Repository
Community Economic Development (SCED). SCED is a unique
program with participation from all over the world, particularly the
United States and Tanzania, and also countries such as Uganda, Peru
and the Philippines. Like many thesis collections, the projects were
printed using consumer-grade equipment, and only one copy was
bound and saved for the library. The international nature of the
projects, in addition to the danger of losing them to deterioration,
made them an attractive collection for beginning a digital repository.
Faculty papers from the International Business program were also
included in the grant project, to make papers once only accessible
from a professor's office available to the world.
Many institutions lack the financial and human resources to build
a successful digitization program. The gap between resources available
versus resources required can often be bridged by a grant; a search for
IMLS grants from 2004-2008 using the keyword "digitization" shows
that at least 57 IMLS grants were provided to libraries and museums
for digitization projects (IMLS, 2009). Like most institutions, the
repository at SNHU's Shapiro Library could not have come to fruition
without grant assistance.
The following pages share the Shapiro Library's experiences
during the first year of repository development.
The People Involved
Digitization programs need a strong level of organization and
administrative support to succeed. Programs that only live within the
walls of the library without buy-in from administration and other
departments are at risk of failure for lack of support. The Shapiro
Library's digital repository is managed by a Digital Initiatives
Librarian, who receives support from the Digital Content Specialist,
two graduate assistants, and two cross-departmental committees: the
Implementation Committee and the Policy Team.
The Digital Initiatives Librarian is responsible for managing the
repository, including creation of metadata standards, scanning
workflows, policy development, and quality control. The Digital
Content Specialist creates descriptive keywords, and writes abstracts
Developing an Institutional Repository 263
for the theses. Two graduate assistants were hired to execute the
scanning, optical character recognition (OCR), and access file
creation.
The Implementation Committee was initially organized to
prepare the grant application, and after the grant was received,
organized the necessary infrastructure. Represented on the committee
are the Library Dean, the Electronic Resources Librarian, the
University Webmaster, the Dean of the School of Community
Economic Development, and both the head of the IT department and
the IT programmer committed to the project. The committee hired the
Digital Initiatives Librarian and a Digital Content Specialist, who both
subsequently joined the committee. The Implementation Committee
continues to meet on an as-needed basis to monitor the repository's
development.
While some members of the Policy Team are consistent with the
Implementation Committee, the focus for this group is to determine
policies for the repository and discuss other questions that might
arise, whether they are related to file format, collection development,
or metadata. Because of the nature of the team, there are more
librarians represented: the Electronic Resources Librarian, Technical
Services Librarian, and the Access Services Librarian are all part of the
team, as well as the Digital Initiatives Librarian, Digital Content
Specialist, and the Library Dean. Also on the team are the IT
programmer and the Associate Dean of the Faculty. The associate
dean's participation is effective in keeping the university
administration informed on the progress and policies of the
repository. The Policy Team initially met every two weeks, and
continues to meet at least once a month.
Developing the Technical Infrastructure
After identifying the initial collections for the repository, the
Implementation Committee selected the digital repository software
and the hardware on which it would reside. Oya Y. Rieger (2007)
explains that when selecting software, a number of factors should be
considered, including matching your institution's needs to the
264 Developing an Institutional Repository
software's features, considering what resources will be required to
install and maintain the software, and assessing the overall usability
for both staff and end-users. Often the question might arise: to open
source, or not to open source? While using open source software is the
current trend, institutions should look closely at their resources to
determine if they can support the technological and human resources
required to work with open source software packages.
The grant awarded to SNHU included funds to hire the Digital
Initiatives Librarian and the Digital Content Specialist. It also
financially supported time spent working on the grant by other
positions already in place, including IT. Assessment of these resources
determined that enough support was available to consider open
source software. DSpace stood out as the most widely-used open
source institutional repository software package available for
academic library use, with an active user community and a wide array
of resources available (DuraSpace, 2009, Resources).
The differences between implementing open source versus
proprietary software quickly became apparent. While DSpace is
advertised to be useful "out of the box," this is not a realistic assertion
(DuraSpace, 2009, About DSpace, para. 1). A certain level of
programming skill and time is required in order to customize the
software. In DSpace, the level of programming needed to make
customizations beyond changing the color scheme of the website can
be daunting for someone without experience in both programming
and website design. The Digital Initiatives Librarian's web design
skills and the IT programmer's skills were both needed to make most
of the necessary customizations to the user interface.
Community support - While the DSpace community is very
active, with a well-populated wiki and listservs for general and
technical questions, it is also a complex community. Users vary by
what platform they work on (Linux vs. Windows) and what version of
DSpace they use. During the time of SNHU's installation, most of the
user community was working with either DSpace 1.4 or 1.5. To further
complicate things, some users of 1.5 were using what is known as the
JSP user interface, while others used the XML user interface - each
Developing an Institutional Repository 265
involving different programming methods for customization.
Therefore, not all questions and answers posted by the community are
relevant to one's needs. One example encountered was a DSpace wiki
entry explaining how to change the DSpace code to enable linking
authors in a simple item record. When the code did not function
properly, the question was posted to the DSpace tech listserv. Another
community member explained that the encoding described in the wiki
had changed in version 1.5 (Piatt, 2009). Additionally, answers to
questions regarding installation varied widely depending on if the user
was on Linux or Windows. While DSpace does have a large user
community, that community requires some careful navigation.
Professional development - In early June 2009, the NITLE
consortium presented a timely DSpace workshop (NITLE, 2009). The
variety of sessions provided a strong background to DSpace's
capabilities. One session in particular, "Developing Interfaces and
Interactivity for DSpace with Manakin Workshop" by Eric Luhrs of
Lafayette College, was extremely helpful, providing tools and the
know-how necessary to make customizations to the XML user
interface (Luhrs, 2009). Without the benefit of this interactive
instruction, the learning curve involved would have been much more
difficult to transcend.
The experience at the DSpace workshop points to the importance
of this type of professional development in the rapidly-changing
digital library environment. Conferences such as the Open
Repositories Conference and the Joint Conference on Digital Libraries
have both included specific DSpace sessions and workshops in the
past. A simple search of the web reveals user groups and workshops
available for other digital library platforms, including proprietary
software such as OCLC's CONTENTdm. Providing funding for
librarians and IT staff to attend these types of educational events
should be a priority for any institution embarking on a digitization
project.
266 Developing an Institutional Repository
From Paper to Electronic
Creating an electronic record for access involves metadata authoring,
scanning, and access file creation.
Metadata - Metadata standards should be determined before
the first item is ever added to the repository. Because qualified Dublin
Core is installed with DSpace by default, and because Dublin Core is
the leading schema for describing digital resources, it was selected for
the schema. Determining which elements to make available in the
DSpace submission form was more challenging. Not every element
should be used to describe a digital object - not all are appropriate for
all collections. Besides, the time-consuming nature of metadata entry
requires that standards be chosen with efficiency in mind. Michael
Boock and Sue Kunda (2009) explain how creating a metadata record
for both the DSpace repository and MARC catalog can take up to an
hour per record, even when students create the majority of the
descriptive metadata (p. 300-302). While it is important to consider
descriptive, administrative, structural, and preservation metadata,
these elements must be chosen carefully to achieve thorough, but cost-
effective item description.
The CDP Metadata Working Group's "Dublin Core Metadata Best
Practices", the DCMI Usage Board's "DCMI Metadata Terms", the
Scholarly Works Application Profile as described by Julie Allinson
(2008), and the Networked Digital Library of Theses and
Dissertations' metadata standard (Atkins, Fox, France, & Suleman,
2008) were all examined. From these best practices, 32 qualified
Dublin Core elements were selected, with the intention that any item
added to the repository could be appropriately described using some
or all of these elements. Approximately 20 of these are used to
describe the SCED thesis projects in particular.
Scanning - During the development of the DSpace
infrastructure, the scanning workflow was also launched. The initial
collection of student theses and dissertations from SCED proved to be
challenging to scan. Part of the purpose of the SCED thesis project is
to document work completed by the student in the field, outside of the
classroom. To that end, most of the theses, collected from 1984-
Developing an Institutional Repository 267
present, include large appendices of documentation including letters,
financial statements, marketing materials, photographs, architectural
plans, and even a wall calendar used as a fund-raiser. Additionally,
students were given the opportunity to be creative in their
presentation, often using color, graphs, and decorative fonts.
Sample theses were selected and scanned by the graduate
assistants to test how the scanner and OCR software would handle the
diverse materials. These initial scans immediately raised questions.
There were not yet policies in place for how much information should
be captured in the scans, causing uncertainty when incidental color
was encountered, such as flyers printed on colored paper.
Additionally, there was confusion surrounding the fact that the
digitization process includes preserving master files, saved in
traditional TIFF format, in addition to the PDF files created for access.
The IT staff was not prepared to store and preserve this large
collection of master files, and panic arose about their massive size -
the files, scanned at 600 dpi in grayscale or color, were 30 to 80
megabytes each. This "megabyte shock" is not unusual, particularly at
small institutions; Stacy Nowicki (2008) also noted problems with
large TIFF files at Michigan's Kalamazoo College.
After much discussion, the Policy Team agreed to scan the papers
for their intellectual content only. Best practices from the California
Digital Library (2008) and the CDP Digital Imaging Best Practices
Working Group (2008) were consulted to determine digitization
standards: a 600 dpi setting for black and white pages, and 500 dpi
for grayscale or color pages. This 500 dpi setting resulted in a
minimum of 4000 pixels on the largest side of the scan, in accordance
with these recommendations (California Digital Library, 3.6.1; CDP, p.
8). It should be noted that if the pages were significantly a different
size, the dpi setting would be adjusted to meet this parameter. Master
files are saved in TIFF format (California Digital Library, 2008, 3.2).
Grayscale and color are only used when necessary to preserve the
intellectual content of the document, leaving most of the pages to be
scanned as black and white. As a result, the master files are much
smaller; the black and white scans are approximately 4 megabytes
each.
268 Developing an Institutional Repository
The solution to the color question requires a certain amount of
human judgment, but is viable because the Digital Initiatives
Librarian and the graduate assistants conducting the scanning are
located in close proximity to one another, facilitating an environment
for quick decisions. David Lowe and Michael Bennett (2009) state
that the Internet Archive chose to scan all their documents in color,
eliminating the need for human judgment (p. 210).
Access File Creation - After creating master TIFF files, it is
necessary to convert them for public access. The Portable Document
Format (PDF) format, processed so that full-text searching is possible,
is ubiquitous among subscription and open access academic
databases. It was the obvious solution for our collection.
To enable full-text searching, the TIFs were processed using
optical character recognition (OCR) software. ABBYY FineReader 9
Professional was selected, based on a review in PC Magazine
(Mendelson, 2008). This feature-rich software enables OCR
recognition and error-checking in multiple languages, and performs
well with most text, including text printed with a dot-matrix printer,
and text formatted in blocks, such as in newsletters and flyers.
From FineReader, the graduate assistants are able to save the
PDF with an option called "text under image," saving the corrected
OCR text in an invisible, searchable layer under the scanned page
image. In order to keep the size of the file reasonable, the PDF images
are saved at 300 dpi; to enhance accessibility, the option for creating a
tagged PDF is selected (Johnson, 2004).
After the PDF is created, it is opened in Adobe Acrobat, and
additional metadata is added to the file's properties, including title,
author, and copyright status.
Information in the repository should be not just available, but
accessible to all. This includes maintaining file sizes to enable faster
load times, ensuring that even users with dial-up modems can
download the files in a reasonable amount of time. According to a
survey led by John Horrigan at the Pew Internet & American Life
Project (2009), seven percent of Internet users in the United States
are using dial-up services at home (p. 7). While seven percent sounds
Developing an Institutional Repository 269
small, it is equal to approximately 9 million households in the United
States, out of the 129 million counted by the U.S. Census by July 1,
2008. International user statistics vary widely, but it would be best to
avoid frustrating any users with unnecessarily large file sizes, thus
increasing the viability of the collection.
Therefore, nearly all of the projects are split into two PDFs.
Because the bulk of most of the thesis projects is the supporting
documents in the appendix, the papers and their appendices are saved
as separate PDF files. Of the first 88 student projects scanned, the
average file size of the project paper by itself was 2.33 megabytes, with
a median of 1.73. The appendices' average was 8.29 megabytes, with a
median of 5.2. Both the main paper and the appendix PDFs are
available from the same item record in DSpace.
To improve the access files' longevity and accessibility, the PDFs
are saved as PDF/A when possible. Roger Reeves and Hans Barfuss
(2009) explain the International Standards Organization's (ISO) goal
for PDF/A is that it "provides a mechanism for representing electronic
documents in a manner that preserves their visual appearance over
time, independent of the tools and systems used for creating, storing
or rendering the files" (The Goal of PDF/A). One example of the
advantage of PDF/A is ensuring that elements such fonts are
embedded in the file, so they display properly even if the user does not
have those particular fonts already loaded on his computer. The utility
provided in Adobe Acrobat Professional 9 was used to save the files in
PDF/A format.
The Often-Missed Point: Digital Preservation
While all of the decisions involved with customizing the DSpace user
interface were being addressed, one major component of the
digitization program was not addressed: the concept of digital
preservation. While it was understood that digital preservation was an
issue, it was uncertain how preservation would be accomplished.
The ICPSR Digital Preservation Workshop at the University of
Michigan was an excellent opportunity to learn more about digital
preservation. This five-day, in-depth workshop made it clear that if an
270 Developing an Institutional Repository
institution presents digital documents online, there is an assumption
that they will be preserved there forever - much like a book on the
shelf is expected to be readable ten, fifty, or even hundreds of years
after it is bound. However, digital files are fragile in their own way,
and are susceptible to obsolescence, storage media problems, and
other issues (Cornell, 2007, Tutorials, chapter 3: Obsolescence &
Physical Threats).
While digital preservation is a complex topic with many
components and considerations, the primary concern was how to
adequately care for the master TIFF files. Each image must be
preserved in the event that the access PDF file becomes corrupted, or
when PDF is superseded by a new access file format. The Digital
Preservation Tutorial developed by Cornell makes it clear that
institutions can not burn files to CDs, put one CD on the shelf, another
in someone's garage, and believe they have preserved their files
(2007). According to the tutorial, even CD standards have changed
over the years, and early formats are now obsolete (Chapter 3:
Obsolescence, Chamber of Horrors, Disk Media). It is also apparent
that backing up files without including any descriptive information is
still not adequate preservation; how many of us have opened a floppy
disk and wondered, "What the heck is all this stuff?" Master files must
also have their own metadata associated with them to describe what
they are. But learning how to preserve these files, as well as
adequately preserving the access files and their associated metadata in
these early years of digital preservation, is a challenging process that
has not been adequately addressed during the first year of repository
development at the Shapiro Library. It is probable that many other
institutions have also not addressed their own digital preservation
questions, or even asked them.
The problems with file obsolescence and data backup are just one
small component of creating a digital preservation program. The
guidelines presented by the ICPSR workshop are a helpful resource in
determining how to ensure that digital preservation at the Shapiro
Library is compliant with standards described by the Reference Model
for an Open Archival Information System (OAIS), an industry
standard. Much progress is anticipated for the second grant year.
Developing an Institutional Repository 271
Conclusion
The myriad of details involved with creating a digital repository at
Southern New Hampshire University were more complex than
anticipated. Learning and implementing standards for metadata,
master files and access files was time-consuming, but taking the time
to establish standards in the beginning doubtless saved a great deal of
trouble for the future. Even so, it will be necessary to keep up with
developing industry standards, and it would not be surprising if
further adjustments are needed down the road. A digital repository is
much like a physical building; periodic maintenance, remodeling, and
wear and tear should be anticipated and expected.
The Shapiro Library's digitization program has strong
administrative support, participation from several university
departments, and strong financial resources. The repository will
become a successful program for the university long after the grant
period concludes.
References
Allinson, J. (2008). Describing scholarly works with Dublin Core: A
functional approach. Library Trends 57(2), 221-243.
Atkins, A, Fox, E., France, R. & Suleman, H. (2008). ETD-MS: an
interoperability metadata standard for electronic theses and
dissertations, ver. 1.00, rev. 2. Retrieved from http://www.
ndltd.org/ standards/metadata/ etd-ms-vi.00-rev2.html
Boock, M., & Kunda, S. (2009). Electronic thesis and dissertation
metadata workflow at Oregon State University Libraries.
Cataloging & Classification Quarterly, 47(3/4), 297-308.
doi:io.io8o/oi639370902737323.
California Digital Library. (2008). CDL guidelines for digital images.
Retrieved from http://www.cdlib.org/inside/diglib/guidelines/
bpgimages/reqs.html
CDP Digital Imaging Best Practices Working Group. (2008). BCR's
CDP digital imaging best practices version 2.0. BCR. Retrieved
from http://bcr.org/dps/cdp/best/digital-imaging-bp.pdf
272 Developing an Institutional Repository
CDP Metadata Working Group. (2006). Dublin Core metadata best
practices: version 2.1.1. BCR. Retrieved from
http://vvrww.bcr.org/dps/cdp/best/dublin-core-bp.pdf
Cornell University Library. (2007). Digital preservation
management: Implementing short-term strategies for long-term
problems. Retrieved from http://www.icpsr.umich.edu/dpm/
DCMI Usage Board. (2008, January 14). DCMI Metadata terms.
Retrieved from http:/ / dublincore.org/ documents/ dcmi-terms/
DuraSpace. (2009). DSpace. Retrieved from http://dspace.org
Horrigan, J. (2009). Home broadband adoption 2009. Pew Internet
& American Life Project. Retrieved from http://www.pewinternet.
org/Reports/2009/ io-Home-Broadband-Adoption-2009.aspx
Institute of Museum and Library Services. (2009). Search awarded
grants. In Grant Search. Retrieved from http://imls.gov
Johnson, D. (2004). What is tagged PDF? In Accessible PDF
Learning Center. Retrieved from http://www.planetpdf.com/
enterprise/ article.asp?ContentID=6o67
Lowe, D. B. & Bennett, M. J. (2009). A status report on JPEG 2000
implementation for still images: The UConn survey. Archiving
2009, 6, 209-212.
Luhrs, E. (2009). NIS Camp: Developing interfaces and interactivity
for DSpace with Manakin. Retrieved from
http://nitlecamp.pbworks.eom/f/manakin-workshop-slides.pdf
Mendelson, E. (2008). ABBYY FineReader Professional 9.0. PC
Magazine. Retrieved from http://www.pcmag.com/article2/
o,28i7,230562i,oo.asp
NITLE Information Services. (2009). Nitlecamp. Retrieved from
http://nitlecamp.pbworks.com/
Nowicki, S. (2008). Using DSpace for institutional repositories.
Retrieved from http://hdl.handle.net/10090/4522
Piatt, A. (2009, Oct. 6). Two theme modification questions. Message
and responses, archived at http://sourceforge.net/
mailarchive/forum.php?forum_name=dspace-tech
Developing an Institutional Repository 273
Reeves, R. and Barfuss, H. (2009). PDF/ A - A new standard for long-
term archiving. PDF/A Competence Center. Retrieved from
http://vvrww.pdfa.org/doku.php? id=pdfa:en:pdfa_whitepaper
Rieger, O.Y. (2007). Select for success: Key principles in assessing
repository models. D-Lib Magazine, 13(7/8).
doi : 10. i045/july2007-rieger
U.S. Census Bureau. (2009). Annual estimates of housing units for
the United States and States: April 1, 2000 to July 1, 2008 (HU-
EST2008-01). In Housing Units: State Housing Unit Estimates:
2000 to 2008. Retrieved from http://www.census.gov/popest/
housing/HU-EST20o8.html
Digitization of the Yale Daily News
Historical Archive
Kathleen Bauer, Ian Bogus, Karen Kupiec
(Yale University Library)
Jennifer Weintraub (UCLA Library)
Abstract
The Yale Daily News is Yale University's independent student run
newspaper. Founded in 1878 it is the oldest continuously published
daily newspaper at a United States university. From the initial print
volumes until digital versions started in 2000, the entire run of the
printed paper consists of 122 volumes and approximately 100,000
pages. In 2007, Yale University Library was asked to create a pilot
project to digitize and make available an initial set of ten years of the
newspaper with a $50,000 start-up budget. In this article, we will
discuss how the project began, and issues that developed during the
process related to copyright, interface design, workflow, quality
control, and fundraising. This project helped Yale University Library,
a large, strongly hierarchical institution, to develop workflows that
allow its staff to develop new skills and work across traditional
departmental boundaries. Library staff that have traditionally
performed tasks related to our print collections or for smaller digital
projects have developed new skills and methods for workflow,
metadata creation and quality control for a large-scale digital project.
Keywords: Campus newspaper, Copyright, Newspaper digitization,
METS, Quality control, Yale, Scanning.
274
Digitization of the Yale Daily News Historical Archive 275
A newspaper digitization project is one that every library, public or
academic, can undertake. It is often not hard to get the rights to a local
or small paper and an academic or public library has a built-in
audience for this type of project. Local researchers will love having it
online and genealogists from further afield will bless you. And yet,
newspaper digitization, while having recently come into its own, has
been somewhat difficult for libraries. Newspapers are crucial to
research, providing detailed local and international accounts of
events; these incredibly important primary source materials are made
of poor quality material that will last a relatively short period of time.
Newspapers are hard to digitize because they are published daily with
hundreds of issues a year, comprised of various individual sections,
and then individual articles, oversized, delicate, and contain
thousands of words and pictures that require careful quality
assurance. In addition they have unusual layouts, and articles often
are split across two or more nonconsecutive pages. There can be
numerous contributing authors, syndicated cartoons, advertisements,
supplements, and even the occasional joke issue.
Fortunately, newspaper digitization is not new. Many
organizations have taken on newspaper digitization and the major
national and regional newspapers are now available for licensing by
libraries. While many projects focus on digitization of newspapers
from microfilm, there is also an increasing number of digitization
projects that begin with the original paper. One important
clearinghouse for information and best practices for newspaper
digitization is the National Newspaper Digitization Program (The
Library of Congress, 2009). This program, a joint effort between the
Library of Congress and the National Endowment for the Humanities
uses the power of grant dollars to enable proper newspaper
digitization, research in newspaper digitization and access to the
digitized papers through a central resource.
The Yale Daily News, Yale University's student run newspaper, is
132 years old and is the oldest continuously published daily
newspaper at a United States university. In 2007, Yale University
Library (YUL) was asked to create a pilot project to digitize the
newspaper with an initial $50,000 start-up budget provided by the
276 Digitization of the Yale Daily News Historical Archive
Yale Daily News's parent foundation (the Oldest College Daily
Foundation) and YUL. In this article, we will discuss how the project
began, and issues that developed during the process related to
copyright, interface design, workflow, quality control, and
fundraising.
This project helped YUL, a large, strongly hierarchical institution,
to develop workflows that allow its staff to develop new skills and
work across traditional departmental boundaries. Staff across the
Library who have traditionally performed tasks related to our print
collections or for smaller digital projects have developed new skills
and methods for workflow, metadata creation and quality control for a
large-scale digital project.
The Yale Daily News (YDN) is staffed and produced by student
volunteers. The paper is not owned by Yale University, and the
student reporters and editors are advised by the independent Oldest
College Daily Foundation (OCD). OCD is comprised of former YDN
staffers and Yale graduates. In 2005 the OCD came to the YUL with an
idea for a project to digitize the Yale Daily News archive and provide
access on the Internet. The OCD realized the complexity of the
proposal especially considering they did not own a complete run of the
newspaper. They asked the YUL to partner with them as OCD owned
the rights to the content while YUL had the expertise and the means to
make it accessible. To start the pilot project, OCD and YUL
contributed $25,000 each to finance a pilot project. YUL decided that
for the pilot project we would not digitize anything for which there
was an existing digital edition (the YDN has been available online
since 2000). Thus, we still had to choose a small amount of material
from 120 years of print issues, or a fraction of the 100,000 pages in
the entire run, for our initial digitization pilot.
This type of partnership, between an external group owning
copyright and the campus library, can be useful for both parties. It is a
good way for library staff to gain experience with a complex
digitization project and digital collection building, it provides useful
material for fundraising for technology projects, it enables the library
to provide a useful resource to the campus community, and it enables
Digitization of the Yale Daily News Historical Archive 277
both the newspaper and the library to create an online product with
research value freely for a product that may not have a large sales
market.
Several basic principals helped guide the development of the Yale
Daily News Historical Archive. Open or commonly used standards for
our digital files were important in the event content needed to be
migrated to new interface software in the future. We wanted to
digitize each newspaper in its entirety, thereby preserving the
historical context provided by editorials, cartoons and advertisements.
Therefore it was important that we capture the images on each page,
not only the text. Another key principle was our requirement that the
Yale Daily News be freely available on the Internet. Finally, we wanted
the newspaper to be fully searchable, browsable, and to include
advanced search features such as byline and title searches. These
principals are similar to those elucidated by NDNP and other
newspaper digitization projects.
The Initial Decisions
Our first decisions concerned how we would scan the images and what
the quality needed to be. Scanning from original source materials will
almost always provide the truest digital image but depending on the
format of the source there may be undue complications. The physical
copies of the Yale Daily News held in the Library's Manuscripts and
Archives department are tightly bound which hinders how well the
volumes can be opened. Tightly bound materials pose a few risks for
digitization projects. Not only can they be damaged during scanning
but they can also inhibit producing quality digital images. Microfilm is
easy and inexpensive to digitize but by nature is already a derivative
format; it will always appear as a black and white photograph of an
original document. Microfilm can also be of poor quality, out of focus,
smeared, scratched, or otherwise unreadable. Pictures that have been
microfilmed often have lost much of their detail.
Luckily, the Yale Club of New York had a complete copy of the
Yale Daily News and was willing to donate it to the project. This third
copy allowed us to avoid the difficult decision of inexpensively
278 Digitization of the Yale Daily News Historical Archive
scanning microfilm, scanning bound volumes at a high cost, or
disbinding our only physical copy, which would balance the cost and
the quality but would leave us with individual leaves to box and store.
We were able to disbind, scan and discard these volumes without
compromising library collections.
The question of how much storage space would be required for
the thousands of images we would receive impacted a number of basic
technical decisions. Bitonal scanning would not represent the
photographs adequately so grayscale or color images would be
required. We felt confident that JPG2000 as a reliable file format for
these kinds of materials. Though the Yale Daily News is an essential
part of research at Yale, we were not undertaking a scanning project to
replace the print and microfilm versions of the YDN, but merely to
provide access. JPG2000 allows large amounts of information to be
stored in a compressed form, without the compression artifacts and
other data losses inherent to JPEGs.
Other universities digitizing complex text and newspapers were
using METS (The Library of Congress, 2010b) along with ALTO (The
Library of Congress, 2010a). The METS files enable the program to
understand the structure of a document, such as a newspaper issue
with 16 pages. These files were considered "required elements" for our
projects, as they would enable us to search within the issues. The
ALTO files provide technical metadata for optical character
recognition. The combination of the METS files which describe the
issue of the newspaper with the ALTO files which describe the layout,
enable full text searching and the highlighting of search words within
the display of text. The user sees a representative image (a digital
photograph) of a page. The searchable text created from OCR is a
stored in a different file, in essence creating a layered document. The
METS/ALTO links these two such that when a portion or zone of the
image is highlighted it is associated with the text it represents. Users
can then search for phrases and see the matching term highlighted on
the page image. They can also select and copy the text directly from
the image.
Digitization of the Yale Daily News Historical Archive 279
Yale was not in a position to write software to provide access to
METS/ALTO files for their full functionality. Instead, we decided use
CONTENTdm, a commercial product that would suit our needs and
fully utilize the METS/ALTO structure. In order to streamline the
process as much as possible, we decided to employ vendors to provide
as many of these services for us as much as possible. The creation of
the METS/ALTO files were included in our request for proposal (RFP)
as "Highly Desirable Elements" allowing us to see potential solutions
our scanning vendors could perform without totally rejecting a
proposal if a vendor could not provide them.
In choosing the issues to digitize it may have seemed obvious to
start with the first issue. However, since the OCD and the Library were
going to need to fundraise to continue the project, both the Library
and the OCD felt it would be best to do a range of interesting time
periods to generate interest with potential donors. In addition, an
interesting grant funded project that was already underway was
digitization of World War 1 posters and pamphlets held in the YUL
collection. We decided to tie into that project and digitize issues from
1913-1919. In consultation with Manuscripts and Archives it was
discovered that archivists there use the Yale Daily News frequently in
answering reference questions and particularly beginning when A.
Bartlett Giamatti was President of Yale. The records of Yale's
President, and many university offices, were closed to researchers for
thirty-five years. The YDN thus provides the best available material
for historical research on Yale activities between 1978 and 1981.
Finally, we decided to digitize a very exciting time period that is
always in demand amongst students, 1967-1970. During this time Yale
and New Haven experienced student protests, a murder and resulting
controversial trial of a member of the Black Panthers, and a move
toward co-education.
The pilot batch consisting of 800 issues from 13 years, or
approximately 8000 pages. We were fortunate to be included in a gift
from the Yale Class of 1945W, which is the class of students from
1939-1945 whose initial Yale education was interrupted by World War
II and completed their degree in an accelerated program upon
280 Digitization of the Yale Daily News Historical Archive
returning. This provided another 6 years of material bringing the total
number of pages to 24,000.
Selection of Vendor and System
In July 2006 we sent the RFP to four vendors and received responses.
Most of the vendors were able to fulfill our required and desired
elements using similar technology. The RFP process and careful
evaluation of the samples supplied by the vendors was illuminating.
Through this process, we were able to learn more about the way
different vendors work and we had a chance to test drive the solutions
to our problem and evaluate different scanning techniques.
All of the vendors used DocWorks software to OCR the files, zone
the articles, and create the METS/ALTO files. Then they prepared the
package of files for loading into CONTENTdm, using a special loader
developed by CCS, the company that created DocWorks. We were
able to provide access to the newspaper by using CONTENTdm. This
software package, used by many academic libraries, enables users to
do full text searching on an issue or across all of our newspapers. The
functionality not only allows viewing images of each page but also
permits the user to click on an article to view the complete article by
itself. The recognized text is also accessible, useful for cutting and
pasting. CONTENTdm can automatically create PDFs of the issue on
the fly for printing. Finally, because of the zoning and the ALTO files,
when an article goes on to another page, CONTENTdm pulls all of the
parts of an article together into one screen, making for easy reading
We chose to work with Digital Divide Data, a nonprofit company
based in New York City. Digital Divide Data employs young people in
third world countries. Their employees not only learn IT skills but also
attend classes part time in an effort to break the cycle of poverty.
Digital Divide Data gave us several options for scanning the
newspapers. Because they are old, the newspapers had a yellowish
tinge. We chose to scan the newspapers in grayscale, which provides a
smaller file size and quicker loading of CONTENTdm. We also chose
to process the text so that the background and text have a high
contrast, while leaving the images in grayscale. The result is easy to
Digitization of the Yale Daily News Historical Archive 281
read, easy and quick to print, and true to the original intent of
newsprint
Workflow
For strategic reasons, YUL does not scan the newspaper issues
chronologically, making tracking the project complicated but a high
priority so that the various participants could at any time see the
status of the project and if any part was held up. A database was
created for each volume of the newspaper. Fields were added
including a pull down list for the status of each volume. Because
various personnel throughout the library were working on the project
this database helps everyone know what the status is of every volume.
When volumes have funding designated the status is changed so
staff knows to prepare those volumes next. The volumes are collated
and a manifest is created that includes basic information such as the
volume, number, date, and page count in each issue. Missing volumes,
printing errors in the enumeration, and possible inhibitions to
scanning are also recorded. The volumes are then disbound with the
manifest in hand. Pages missing a significant amount of text are
removed with the rest of the issue it belongs to. These issues are
recorded in the database under "Issues Needing Replacement."
Microfilm will be used to scan missing issues and in order to keep a
visual consistency entire issues are scanned from the same sources.
For efficiency sake, missing issues are gathered and microfilm is sent
and scanned in batches.
Once a volume has been disbound the manifest is printed and tied
with the volume. Preparing volumes is frequently performed in
batches between shipments. This allows us to have a number of
batches ready and waiting until the shipment is due making deadlines
easier and less harried.
Regular shipments are usually sent to Brechin Group, a
digitization vendor subcontracted by Digital Divide Data, at the
beginning of each month. After Brechin scans the volumes, the digital
files are then stored to hard drives and sent to Digital Divide Data's
Cambodia office for processing where the images are zoned, metadata
282 Digitization of the Yale Daily News Historical Archive
is created, and the CONTENTdm files are created. Once processing is
complete the hard drives are then sent back to Yale where the files are
upload directly into our CONTENTdm test server for quality control.
Quality Control
Given the quantity of items in every shipment, it is not possible to
check every page. Instead sample issues are checked from every batch
returned by the vendor using the ANSI/ASQ Z1.4-2003 (American
Society for Quality, 2003) standard quality control procedures. This
standard clearly defines how many items need to be sampled, the
acceptable error rate, and when to accept or reject a batch based on
the number of errors discovered. This standard was developed based
on, and is almost identical to, an old military standard (i.e., MIL-STD-
105E, Department of Defense, 1989) for inspecting shipments.
This method is based on the idea that there is a level of error one
is willing to tolerate but if the error rate is too high the entire batch
will be rejected and reprocessed. The Yale Daily News Project
identifies four separate areas that need to meet specified quality rates:
Image Quality, Zoning, Headlines & Authors, and OCR. If a particular
area fails the vendor only has to redo that particular area, not
reprocess the entire batch from the beginning. This helps zero in on
particular problems for our vendors, though it complicates how we
select the samples. Ideally, the sample would be totally random, but
this is not feasible considering the samples are based on the particular
units being checked; so the image quality is based on each page as a
unit, but the zoning error rate is based on each article as a unit. It isn't
reasonable to line up all the pages, let alone the articles, and randomly
sample them considering complete issues are loaded into
CONTENTdm. To get around this, a system was created to convert
article and pages into whole issues to be checked.
A quality control tool was created that helps staff through the
process that is based off the ANSI standard. The manifests for each
batch are added and the inspection level is chosen. The tool then
determines how big the sample sizes are as well as the rejection
thresholds. A random number generator selects the issues that will
Digitization of the Yale Daily News Historical Archive 283
comprise the sample. The sample issues are searched in the
CONTENTdm test server where the batch was loaded. Errors are
recorded in the tool as well as notes such that it can be easily found
again if needed. Once complete the tool calculates the results and tells
staff appropriate actions. Once a batch is approved it is moved from
the test system to the live system.
Copy Right
When we started the Yale Daily News Historical Archive, the copyright
situation seemed straightforward. Copyright of the YDN belongs to
the Yale Daily News Publishing Co., Inc. This company is run by the
student officers of the YDN, under advisement of a professional
manager. The project was conducted in close collaboration with the
officers of YDN and the Oldest College Daily Foundation, and we had
their permission to digitize and make material freely available. Early
in the project we concentrated on the earliest material from the 1880.
This material was mainly text written by students, with some
advertisements. There were no photographs. Subsequently material
from World War I was digitized, but material was pre-1923 and again
consisted mainly of text written by students, although a few
photographs began to appear.
It was not until we started work on material from the 1960's that
we grew concerned that we might have a problem. In the 1960's the
YDN began to run comic strips, some of which were produced by Yale
students. Most notably, a strip called Bull Tales first appeared in
September 1968 written by an undergraduate named Garry Trudeau.
After Mr. Trudeau graduated he changed the name of the strip to
Doonesbury which was then syndicated nationally, including in the
YDN. Peanuts also regularly ran in the YDN throughout the 1960's,
1970's and later years. The inclusion of these popular and copyright
protected comic strips raised red flags for us: were we allowed to
include this material, and did we need to seek permission to do so?
We worried that we would need to excise this material from the
digitized copies of the papers.
284 Digitization of the Yale Daily News Historical Archive
The comic strips were not unique material that could not be
found elsewhere, but it did seem that at a time of social and political
upheaval at Yale and the entire nation, the YDN staff included the
strips for a purpose, and they did play a part in the tone of the YDN.
Doonesbury was often overt political comment on current events:
Peanuts less so, but still was part of a social commentary. The
digitized YDN would not be complete without the inclusion of the
strips as they first ran in the paper. We decided that a safe course of
action would be to contact the rights holders, Trudeau and the estate
of Charles Shultz. In both cases, permission was given to use this
material.
We were lucky that this issue was a problem for other projects as
well. In June 2008, at the same time we were granted permission by
the authors to run both strips, the 11th circuit court of Appeals in
Atlanta rendered a decision in the case of Greenberg v. National
Geographic Society siding with the NGS. Greenberg had sued NGS for
including in a digitized version of the magazine material written by
Greenberg and originally published in the printed magazine.
Greenberg claimed that NGS only had permission to use his material
in the original publication, and in reusing his work in the digital
version had violated his rights under Section 201c of the Copyright
Act. In siding with NGS, the court found that the use of Greenberg's
and other's work was permissible because the magazine was faithfully
reproduced and presented as the original, with material presented in
its original context. This was in contrast to the 2001 finding in Tasini v
the New York Times Co., where the Supreme Court found that the
rights of freelancers were violated because in the digital product in
question individual articles could by viewed individually, without the
original context of surrounding material. The difference between these
decisions ultimately lays in how the digitized content is displayed. The
publisher has the rights to the issue as a whole and can repackage it as
a whole, but they cannot split out the parts and make them into a
different product. In our presentation of the YDN, each issue is
presented in its entirety as a faithful and full reproduction of the
original. While searches will indicate individual articles (and
occasionally comic strips) the user always finds that material in the
Digitization of the Yale Daily News Historical Archive 285
context of its original issue. This decision meant that although we did
get permission from Garry Trudeau and Charles Schultz to use their
strips it was not required that permission was specifically granted to
include them.
Challenges involving Workflow
and Data Correction
Anytime a new project starts, especially with a new vendor, there is a
period of adjustment and problems are expected. Vendors work with
numerous customers with various expectations. It is extremely
important for customers to be as clear about their expected outcomes
and requirements as possible. Good vendors will make every attempt
to satisfy those requirements.
In earlier volumes of the Yale Daily News it was not common for
authors to be named in the articles. During a batch of later years,
when it became common to name authors, it was discovered that the
authors' names did not display with other metadata in the header bar
in the pop up window. This was recorded as an error as it was our
understanding that fields tagged as "authors" would show up with the
title of the article. In discussing this with the vendor they were in fact
tagging them correctly and gave us a short list of issues to check in our
CONTENTdm installation for why author names were not being
displayed.
Other problems were found before the material left the library. As
we started disbinding older volumes there was some damage to the
pages. Examples of types of damage were photographed and sent to
the project staff so decisions could be made on where the cut-off of
acceptable versus unacceptable damage would be made. It was
ultimately decided that some minor loss of text was acceptable as long
as the user could still surmise the lost text and it was not in an area
that may be in a targeted search, such as the title of the article.
Other challenges have come up during the course of the project
but because of our good relationship with the vendor we can work out
solutions easily.
286 Digitization of the Yale Daily News Historical Archive
Funding and Sustainability
The project was originally funded with contributions from the YUL
and the Oldest College Daily Foundation. In addition the YUL
matched its original funding amount for one additional year. At the
same time the Library Development Office added the YDN digitization
project to its priority list and began actively seeking contributions. The
Yale Class of 1945W (the World War II years) signed on quickly to
fund the eight academic years covering 1940-1948.
As we moved the project from the pilot phase into production we
faced challenges brought about by the depressed economic climate of
early 2009. This became the most important factor in determining
how the project would continue. Decreases in the Library budget
resulted in the elimination of additional library funds. Therefore,
digitization of additional content is currently being funded entirely by
donor contributions.
The Library's Development Office continues to prioritize this
project on its development list and actively works with potential
donors to identify time periods that may be of individual interest and
are available for funding. Contributions are publicly acknowledged on
the Library's website (http://images.library.yale.edu/
digitalcollections / ydnAcknowledgments . aspx) .
As we move forward issues for digitization will be selected based
on a variety of criteria. Donor funding may be given for specific years.
These years will be given the highest priority in the digitization queue.
Next, if donors do not select specific years for digitization, priority will
be given to years that are in demand by researchers at Yale. Specific
years or eras are requested repeatedly - including the 1960's.
Identification, digitization and availability via CONTENTdm of these
specific years can improve services to researchers by providing them
immediate access to the information they require while at the same
time increasing staff productivity by avoiding repeated copying of the
same articles. Once we have digitized all content identified as highly
desirable by researchers we will fill in remaining year gaps beginning
with the oldest content.
Digitization of the Yale Daily News Historical Archive 287
Conclusion
The Yale Daily News digitalization has been a challenging and
rewarding project. It has been a partnership with organizations
outside of the library, such as the OCD Foundation, and also a great
opportunity within the library for various departments to work
together. The core team within the library comes from five separate
departments: Cataloging and Metadata Services, Electronic
Collections, Library Access Integrated Services, Preservation, and
Usability & Assessment. This group meets regularly and work out
issues that each department have interests in finding collaborative
solutions, many of which were able to move into other fledgling
projects. In addition, Preservation department staff performs the
tasks of preparing the newspaper for digitization and for checking the
quality of the digital files as they return. This process utilized skills the
department already had, expanding them for newspaper digitization.
On the fundraising side it has proven to be a great springboard.
Yale graduates tend to be quite loyal and they have frequently been
very interested in looking back into the digitized YDN content that has
been created. Not only are they giving back to help complete the
project but it has also been an eye catching project where other
potential projects can be discussed and funded.
Finally, digitizing the Yale Daily News has enabled the library to
produce a free, highly useful, and unique digitized resource for both
Yale University and other researchers. The expertise gained in this
project has enabled YUL staff to build on this success with other, more
complex, digital projects. YUL staff can now successfully digitize
varied material such as maps, annotated manuscripts and books. Mass
digitization of books may free libraries to concentrate on high quality
digitization of unique material that is present in nearly all public and
academic libraries. A newspaper digitization project can be an
excellent springboard for digitization of this diverse set of material.
288 Digitization of the Yale Daily News Historical Archive
References
American Society for Quality. (2003). ANSI/ASQ Zi. 4-2003:
Sampling procedures and tables for inspection by attributes.
Milwaukee: American Society for Quality.
Department of Defense. (1989). Military standard. Sampling
procedures and tables for inspection by attributes. Washington,
D.C.: Department of Defense.
The Library of Congress. (2009). National digital newspaper
program. Retrieve May 20, 2010 from http://www.loc.gov/ndnp/
The Library of Congress. (2010a). ALTO: Technical metadata for
optical character recognition. Retrieve April 10, 2010 from
http : / / www.loc.gov/ standards / alto/ .
The Library of Congress. (2010b). METS Metadata encoding &
transmission standard. Retrieve April 10, 2010 from
http : / / www.loc.gov/ standards / mets /
New Jersey Digital Legal Library
Wei Fang (Rutgers University)
Abstract
The New Jersey Digital Legal Library (NJDLL) was launched in 2003
and it's been up and running since then. NJDLL has sixteen
collections with about 500 video clips and 20,000 electronic
documents. It has served over 300,000 visitors and is popular among
legal professionals and researchers. The goal of this project is to create
a Web-based digital library where patrons can browse and search for
previously unavailable New Jersey legal information. In this article,
the author presents the background history of the NJDLL, how it was
constructed, workflows, and how problems were solved in depth.
Future development suggestions are also discussed.
Keywords: digital library, legal library, XML, LAMP, repository,
Supreme Court, oral arguments.
Introduction
As part of the Rutgers University School of Law, the Rutgers
University Law Library - Newark with its more than half a million
volumes of printed books is the largest law library in New Jersey. Its
collections include state and federal statutes and court decisions,
federal and New Jersey regulations and administrative decisions,
federal and New Jersey legislative history materials, the codes of
ordinances for many New Jersey municipalities, Anglo-American legal
periodicals, the primary materials of international law, extensive
historical materials on English law, and a special collection of
289
290 New Jersey Digital Legal Library
criminology and criminal justice materials. The library has served the
law school and the surrounding legal community for more than a
century. Our law librarians wanted to share the unique collections that
the law library had with others in the world and the only way to do it
was to convert the collections into digital documents and host them
online via the Internet.
The NJDLL project focuses on legal materials related to New
Jersey. The target audience groups are legal professionals, historians,
researchers and the general public. After more than five years of
development, its collections include electronic legal documents as well
as New Jersey Supreme Court oral arguments archives. The NJDLL
project was first proposed in the fall of 2000 by Paul Axel-Lute, the
collection development librarian, and was initialized in spring of 2002
with hardware and software funded by Rutgers University. In April
2003, the digital library was inaugurated with three collections: 1)
New Jersey Administrative Reports 2) New Jersey Executive Orders
and 3) New Jersey Attorney General Opinions. In 2005, the law
library negotiated with the Supreme Court of New Jersey to be the
host of its Oral Argument video clips using Microsoft Windows Media
Video technology. We are also hosting the digital collection of the New
Jersey Council On Affordable Housing's decisions. The NJDLL has
now grown to become a popular digital library with sixteen collections
ranging from year 1779 to present (see table below).
Collection
Description
Holmes v. Walton
Case file transcriptions (1779-1783) etc.
New Jersey Administrative Reports
First Series; Covers 1979-1991
New Jersey Attorney General
Opinions
Covers 1949-1998
New Jersey Council on Affordable
Housing (COAH)
Decisions, 1986-Present
New Jersey Department of Human
Services, Administrative Orders
Orders from 1 977 forward
New Jersey Executive Orders
Covers 1941 - January 1990
New Jersey Governor's Code of
Conduct
Text of Code
New Jersey Digital Legal Library 291
New Jersey Affordable Housing
Archive
Various resources relating to affordable
housing in New Jersey
New Jersey Legislative Counsel
Opinions
Opinions Since 2004
Supreme Court of New Jersey Oral
Arguments
Video Archive of Arguments since 2005
The Law of Slavery in New Jersey
Bibliography, Statutes and Cases
The Mount Laurel Material of The
Affordable Housing Archive, New
Jersey
History and Selected Documents
Miscellaneous Documents
Topical Legal Documents
Mercer Beasley & Univ. of Newark
Law Reviews
Digitized versions of select Law Reviews
Statute Compilations
Compilations of New Jersey Statutes
Weintraub Lecture Series
Selected lectures from the law school's
annual lecture series.
NJDLL Collection List
System Architecture
After the project was started, a new position was created and the
library hired Kevin Reiss in October 2003 as the digital services
librarian who served the library till March 2006. He played a major
role in setting up the overall architecture of the site. Meanwhile, the
library purchased a new server that has two Intel Xeon processors,
two 74 GB hard drive in RAID 1 configuration, two 100 MB network
interface cards and 8 GB internal memory. The server was set up in an
air-conditioned, secure room with automatic fire extinguisher. It was
then connected to a high-speed network for the best user experience.
The library decided to use as much open source software as possible.
The best part of the open source software is that it is free. Also, unlike
most of the commercial software, users can look through all the source
codes to make sure it is safe to use if necessary. As a result, the entire
project was built with Redhat Linux, Apache server, MySQL database
server, PHP and XML parser. We also used Swish-e search engine
(Swish-e, 2009) for the searching functionality. Operating system,
292 New Jersey Digital Legal Library
server software and supporting software were installed within a
month. A tape-based system performs full backs up of the server once
a week.
Usability and Accessibility
The library administration identified three collections to get the
project started. After law librarians selected documents in these three
collections and the server was tested, the digital services department
of the law library received go-ahead signal to start building the site.
The first and critical step was to determine the digital documents'
formats. There were many digital document formats that we could
choose from, including Adobe PDF, Microsoft Word, Corel
WordPerfect, etc. After some careful evaluations and tests, we chose
Adobe PDF and DjVu as the formats for our digital documents of the
collections. Adobe PDF and DjVu are widely supported by the open
source community and they both have free readers available for PC
and Mac users to download. For all scanned materials, we defined a
protocol that requires all the documents to be saved in 300 DPI
monochrome in both TIF format and JPEG format. We also
performed Optical Characters Recognition (OCR) on scanned
documents. Depending on the size of the collection, the library
outsourced boxes of documents to an outside digitization company.
The library would contact a digitization company and provide
specifications for the company to follow. For smaller sized and
consistently updated collections, instead of outsourcing, the library
purchased a high speed sheet scanner and an overhead book scanner
to scan all the documents in house, since this method would cost less
and provide the most convenience in the long run. After converting
paper-based documents into digital documents Adobe PDF, Djvu, and
plain text, XML files for the metadata of the documents were
composed manually.
The site was built using PHP language with item information
stored into a MySQL database. A menu system was designed and it
used XML file to generate the menu on the fly. For instance, when a
patron visits the site, the PHP language will produce the page layout
and extract item information from the database while the XML parser
New Jersey Digital Legal Library 293
generates the menu for that particular page. This provides the system
manager with flexibility and accessibility to manage the site.
User interface could be the most critical element when designing
a Web site. Digital libraries have traditionally been regarded as
difficult to use (Borgman, 2000). When the site was constructed in
2003, most people used computers with dial-up connection and low
resolution monitors. The library, therefore, designed the NJDLL with
a fixed resolution at 800 pixels in width. To reduce the network load
and provide a faster speed, the site used only one graphic, the New
Jersey logo, on most of its pages. We also designed a simple interface
with an interactive menu list - just to make the uers to be comfortable
with (Markri S., etc al., 2006).
Workflows
As described in the System Architecture section, the library uses
as many open source tools as possible. However, this does not mean
that the library objects to using commercial software. We use Adobe
Acrobat Professional (Adobe Acrobat Professional, 2009) to generate
PDF files and use Any2 Djvu Server (Any2DjVu Server, 2002) to
generate DjVu files from PDF files. The Adobe Acrobat costs about
one hundred dollars per university license and the Any2 DjVu service
is free of charge. Similarly, work study students use commercial
software Abbyy Finereader to perform OCR tasks and use open source
software, Notepad++ (Notepad++, 2009), to compose metadata files.
For video files, work study student use Windows Movie Maker from
Microsoft (WMM, 2004) to convert video clips. Although the WMM is
not open source, it is freely available.
Throughout the years, the library has developed and improved
three protocols to process digital objects based on the types of the
resources as well as the scale of the resources (Figure RU-i).
Internal Resources
The library defines internal resources as collections over which
the library has total control. Reference librarians are in charge of
identifying each and every item to be scanned. Depending on the size
of the collection, the library administrators decide whether to scan the
294 New Jersey Digital Legal Library
documents in house or to outsource them. When a collection is
scanned in house, work study students, hired from the law school and
other departments of the university, are assigned to scan the
documents using the high-speed sheet scanner for loose leaf pages
and using overhead book scanners for rare documents. They also
generate all metadata files associated with each item. When a
collection is outsourced, the library contacts the outside vendor to
provide digitized documents using CDs or DVDs. The digital services
librarian will do a quality check and verify the quantity. If everything
meets the requirements, work study students will be assigned to
compose metadata for the documents.
Digital Objects
Printed documents
in large quantity
r Digitize doc u merits ^ ~"\ /r
and OCRusing
predefined
specifications
Post new content
Define
specifications
Sfiipcmt documents
to print shops
• 1
Guartityand quality
checkof received
digital documents
3
Compose Metadata
Post new content
Figure RU-1: Workflows
Convert video clips
using predefined
spedficat
Compose metadata
files for each video
clip
T"
When the content is ready to be updated, the digital services
librarian uploads metadata and digitized documents to the server and
runs a UNIX script to generate database records and item information
New Jersey Digital Legal Library 295
based on the metadata file. Librarians and IT personnel use SSH to
upload files and run UNIX commands. To simplify daily work, they
also use Webmin and phpAdmin to manage the site and the database.
External Resources
External resources are the resources that the library has to
acquire from outside of the library and the library has no or very
limited control over the quality of the resources. The NJDLL has two
collections that use external resources: New Jersey Supreme Court
Oral Arguments (NJSCOA) and legal documents from New Jersey
Council On Affordable Housing (COAH).
The library uses the in-house protocol to scan materials for the
COAH collections. For the NJSCOA, the library assigns work study
students who are law school students for their capability of
interpreting legal materials. Work study students are trained to
produce metadata and video clips for this collection based on the files
received from the IT office of the New Jersey Supreme Court. As a
procedure jointly developed by the Court and the library, after the
case is argued, the IT office will transfer video recordings in standard
broadcast resolution along with transcripts and detailed argument
information to a FTP server. When we decided to host the NJSCOA
video archives, we developed a new database and procedure to convert
the original high resolution video clips received from the Court into
low-bandwidth and medium-bandwidth video clips. For each and
every video recording, work study students will produce two video
clips for medium and low bandwidth, metadata in XML format. After
a month of the argument date, cases will be uploaded to the server and
posted by the digital services librarian using a UNIX script.
Quality Control
Quality is a very important element of any successful Web site. As
the matter of fact, if one site does not provide a good quality expected
by the visitors, it will soon lose credit, attract no visitor and dry out
quickly. Using existing standards is very important in quality control,
product usability and documents preservation. Work students will
296 New Jersey Digital Legal Library
encode metadata records for each digital record by following the
standards listed below:
1) METS: Metadata Encoding and Transmission Standard
(METS, 2009)
2) EAD: Encoded Archival Description (EAD, 2002)
3) Text will follow the Text Encoding Initiative (TEI) P5 guideline
(P5, 2009)
New work study students need to be trained before they can work
on any assignment. The library has composed many procedures for
the students to follow. Before the metadata records can be used to
generate web pages and shared online, the digital services librarian
will check them for quality.
Backup Strategy
A modern library should provide two basic functions: preserve
and serve. As important as serving knowledge to the people, our
library also has long history of preservation. All master scan images in
NJDLL project are stored in uncompressed TIFF format. Metadata
files are composed using XML formats, and we have a well-oiled
backup strategy to ensure that we don't lose these important digital
assets when disaster strikes.
A mirrored RAID 1 hard drive array is used as the first tier of
backup. The IT office controls an external tape drive that backs up the
entire server every week. To save tapes, the IT office recycles tapes
after two months of the last backup. The digital services department
also backs up the server remotely by using a server-grade external
hard drives as the third tier of the backup. Each tier holds a full set of
backups, including normal backup sets and incremental backup sets.
Frequency of backups: One normal backup would be performed
monthly. Every week, an incremental backup would be performed.
Two physical locations are assigned for the first tier and third tier
backup devices.
New Jersey Digital Legal Library 297
Problems Encountered and Counter Measurements
During the past five years, the biggest technical problem we had
was security. We started the Web site and made it public in 2003. The
site did not have any protection initially since we thought that we are
an organizational site and it is on a state university's network. The site
was hacked within a month. After the incident, the server and the site
were restored and we installed firewall and trapwire on the server. The
site was also readjusted to accept administrative connections from
Rutgers University IP addresses only. Both the IT office of the law
school and the library monitor the activities on the site consistently
and update server patches as soon as they are available.
Structurally, the site was built by grouping small collections one
at a time. Each collection had its uniqueness in terms of materials and
focus scope. The site had to be fine tuned each time there was a new
collection to be added. When we added the Supreme Court Oral
Argument into our collection, one big issue was that we had to use the
Court file name schema that is difficult to use and understand by our
work study students. For instance, a dash and an underscore may
make no difference to the student workers, but the UNIX server will
treat them differently. We had to communicate with the Court and
adjust the naming schema between the library and the Court.
Another thing which may not be as obvious as the name schema
was the technical documents that the site management team has to
compose. When the server was installed and the site was built,
everyone involved focused on their work and paid little attention to
writing down technical notes and documents. When some of the staff
members left for other positions, the successors had a lot of difficulty
in understanding why the script or the system was designed in its way.
We now require all staff members to write technical documents on a
regular basis and new staff members have to read those documents
before performing any work on the server or the system.
The technologies we have used are becoming obsolete gradually.
One example is that the library has been using the Windows Media
Video for years and keeps getting phone calls from Mac users
complaining that they are not able to view the contents. Considering
298 New Jersey Digital Legal Library
the fact that more and more students and attorneys are using Mac
computers, this is something the library has to address. We are
building our own Youtube-like video repository to host copyrighted
video recording (Fang, 2009) and we may use the same video
repository to host the Supreme Court oral arguments in the future.
One issue that is hard to get around is that the inconsistent
quality of work study students. Every semester the library has to hire
students to scan documents, compose metadata, and convert video
clips. As required by the Rutgers University, the library can only
choose students within the university. Students may come from the
law school or any other department of the university. Generally,
undergraduate students perform tasks with the lowest quality. They
have no idea about legal documents and make errors all the time. On
the other hand, law school students perform all the tasks the best.
They get their hands on quickly and enthusiastically. However, they
are always on a busy schedule especially during the exam session. In
order to solve this issue, the library composes protocols and asks new
work study students to memorize them. This can reduce errors in
some degree though it cannot completely eliminate them.
The project has been up-and-running for more than five years.
We would like to track website activities. Apache web server provides
a log file and the server has software that can analyze the log file.
However, it does not provide many details on each page or file viewed
by the visitors. We tried other new methods on the site, including Web
analytic software and some online Web analytics providers. Not all
these methods were reliable. In one particular case, the Web site
stopped responding since it was waiting to load tracking codes from
the tracking provider. After deploying Google Analytics (Google,
2009) on the library's main Web site, we got positive results (Fang,
2007). The library finally decided to use Google Analytics on the
NJDLL site for its functionality and reputation. Now, after two and
half years of using the Google Analytics to track our site, we are
satisfied with the results.
New Jersey Digital Legal Library 299
Important skills learned from doing this project
1. Security always has the top priority: the server should be behind a
hardware or software firewall and set up in a way that
administrators can only log in from the Intranet. Just in case
something bad happens, a fully tested backup plan and
equipments should be used.
2. Use uniform name schema: when collaborating with another
organization to develop a new collection, it is extremely important
to go through naming schemas before starting coding. Every detail
has to be thought about thoroughly or later on the smallest hole
can grow unmanageable.
3. Document everything: Spending five minutes writing a short note
may save you a day of work later.
4. The last day of old technology is better than the first day of the
new technology: Five years may seem to be a short period in the
technology field, but we've seen a lot of new hardware and
software faded away. Our take on the technology is that we stick
with a time-proofing reliable platform as long as it meets our
requirements and visitors' expectations.
Future development
As of now, we have about 500+ video clips and about 20,000
electronic documents on the site. The site attracts about 40,000
visitors annually, which is just enough for our hardware to handle.
The law library will keep maintaining and adding new items to the site
every semester. New collections will also be added once they are
digitized. The biggest goal for the next few years would be merging our
video content from Windows Media Player format to the more popular
Adobe Flash Video format.
Currently the entire site is being indexed by the Swish-e search
engine on a weekly basis. We have some good experiences with this
free search engine, but it is hard to be reconfigured and needs some
workaround procedures to get the XML files indexed correctly. We are
planning to replace the Swish-e search engine with Google's Custom
Search, with which the university has contract.
300 New Jersey Digital Legal Library
When the time comes to replace our current server's software
platform, we will use Ubuntu server edition (Ubuntu, 2009) and
Drupal (Drupal, 2009) as the new platform and merge over all the
existing contents. Ubuntu server edition is a Linux based LAMP ready
operating system. Drupal is an open source content management
system, which has been used by thousands of Web sites, including
government Web sites such as the whitehouse.gov.
References
Adobe Acrobat Professional. (2009). Adobe and PDF. Retrieved from
http : / / www. adobe, com/ products/ acrobat/ adobepdf.html
Any2 DjVu Server. (2002). Welcome to the DjVu Any2Djvu Server.
Retrieved from http://any2djvu.djvuz0ne.org/index.html
Borgman, C. (2000). From Gutenberg to the global information
infrastructure: access to information in the networked world.
Cambridge, MA, USA. MIT Press.
Drupal. (2009). About Drupal. Retrieved from
http:/ / drupal.org/ about
EAD. (2002). Encoded archival description version 2002. Retrieved
from http://www.loc.gov/ead/
Fang, W. (2007). Using Google analytics for improving library
website content and design: A case study. In Library Philosophy
and Practice, Annual Vol. 2007 (Special Issue Libraries and
Google) Retrieved from http://www.webpages.uidaho.edu/
-mbolin/ fang.pdf
Fang, W. (2009). Online law school video repository: The flash way.
Computers in Libraries, Vol. 29, No. 6.
Google. (2009). About Google Analytics. Retrieved from
http://www.google.com/analytics/
Makri, S., Blandford, A., & Cox, A. (2006). Studying law students'
information seeking behaviour to inform the design of digital law
libraries. Paper presented at the Alicante, Spain.
New Jersey Digital Legal Library 301
METS. (2007). METS schema, & documentation. Retrieved from
http : / / www.loc.gov/ standards / mets / mets-schemadocs.html
Notepad++. ( 2009). About Notepad++. Retrieved from
http:/ / notepad-plus.sourceforge.net/ uk/ site.htm
P5. (2009). TEI: P5 guidelines. Retrieved from http://www.tei-
c. org/ Guidelines/P5 /
Swish-e. (2009). About Swish-e. Retrieved from http://swish-e.org/
Ubuntu. (2009). What is Ubuntu. Retrieved from
http://www.ubuntu.com/products/whatisubuntu
WMM. (2004). Windows Movie Maker 2.1 Download. Retrieved from
http : / / www.microsoft.com / windowsxp / downloads/
updates/moviemaker2.mspx
Pratt Institute: A Historical Snapshot of
Campus and Area
Paul Schlotthauer (Pratt Institute Libraries)
Abstract
The chapter discusses the planning and procedures for two
digitization grants received from METRO in 2005 and 2006 by the
Pratt Institute Libraries. Despite the creation of a timeline,
unexpected issues necessitated adjustments. Selection of materials
was more labor intensive and time consuming than expected. Our
collection consisted of different formats, which required managing
significant differences in metadata content and interpretation. OAI-
compliancy was problematic with our image management software.
Also, we had modified the Dublin Core fields, which created
compatibility issues with OAI gateways. In 2006, we planned for
potential pitfalls based on our previous experiences, but still
encountered delays and problems, such as color management. In
2008 we mounted our images on Flickr and saw a marked increase in
use.
Keywords: Archives, Best practices, Bookplate, Color correction,
Compliancy, Digitization, Historical images, Image management
software, Imaging, Intellectual control, Metadata Template, Negative,
Pratt Institute, Preservation, Project management, Selection criteria,
Selection guidelines, Special collections.
302
Pratt Institute: A Historical Snapshot of Campus and Area 303
Introduction
In 2004, when we applied for the METRO grant for digitization, the
library at Pratt Institute's Brooklyn campus had a Visual Resources
Center that, since 1997, had been digitizing its collection of over
160,000 slides for teaching and research purposes. We had staff and
student workers with training and experience in digitization, as well as
dedicated space, equipment, and technical support. Our digitized
images were available online, so we had some acquaintance with the
issues surrounding online collections.
For us, therefore, it was not so much a question of beginning a
digitization initiative as it was beginning a new phase in our already-
existing digitization program. Our hardware was no longer state of the
art. The evolution of metadata schemas and best practices, such as
Western States Digital Imaging Best Practices (later revised by the
Bibliographic Center for Research as BCR's CDP Digital Imaging Best
Practices) (Bibliographic Center for Research [BCR], 2008), meant
that our own protocols were inconsistent if not out of date: they
worked for us, but they were not truly in step with the most current
professional standards. We envisioned a METRO grant as an
opportunity to update and tighten our practices, and believed that the
resulting increase in efficiency would generate momentum to expand
our digitization program in order to better meet the increasing
expectations of our users.
Determining the Team and the Project
When we applied for the grant, we began to assemble the team.
For project manager our Director of Libraries selected our Visual
Resources Curator, who had managed Pratt's digitization program for
over a year and had a background in photography, metadata, and
technology. As the Institute's archivist, I had devoted considerable
effort to researching, organizing, and preserving the archives' large
image collection, and was therefore asked to serve as the selector and
conservator of the materials to be digitized.
Throughout the course of the project, both the project manager
and I took several of the digitization training courses offered by
304 Pratt Institute: A Historical Snapshot of Campus and Area
METRO, including Digital Imaging, Digital Project Infrastructure,
Preserving Digital Materials, Metadata for Digital Collections,
Introduction to XML, Basic Copyright Issues, and Using Adobe
Photoshop for Image Quality Control. These classes provided us with a
firm foundation in various aspects of digitization that enabled the
project to proceed more efficiently.
For our scanning technician we selected a graduate student from
the School of Information and Library Science who possessed
considerable knowledge of imaging, metadata, and Web content
management issues, and who had prepared, scanned, retouched, and
cataloged many images in the Visual Resources Center. We also began
to draft a job description for a project cataloger.
In determining the experience and training needs of the project
team, we adhered to the guidelines of the North Carolina Exploring
Cultural Heritage Online (ECHO) initiative and addressed four
principal issues: conservation, digitization/encoding,
metadata/cataloging, and technical development/support (NC ECHO,
2007). We also planned to use a graduate student from Pratt's
Communications Design Department to develop the Website for this
project. As with our scanning technician, we involved both our School
of Information and Library Science and our School of Art and Design
in a truly collaborative effort, which gave weight to our assertion that
digitization was relevant to the entire Institute, not just the Libraries.
As the image curator, I was faced with the problem of what to
select from the archives and Special Collections for digitization, given
the embarrassment of riches at the Institute and METRO'S limit of
five hundred images per institution. As Janet Gertz (2007) has stated,
"In selecting well, institutions of all sizes and types concentrate on the
parts of their collections that are best suited to digitization, make the
most effective use of the technology, and meet their clients' needs. . . .
Good selection decisions come through carefully assessing the
physical nature and content of the original materials, the intellectual
property rights connected with them, and the requirements for a
technically sound, well-described, and cost-effective product that
serves both users' need for access to the content and the institution's
Pratt Institute: A Historical Snapshot of Campus and Area 305
need to preserve the materials" (Introduction, paras. 1 and 2). I
therefore decided to make an eclectic selection from several
collections rather than focusing on one format or theme.
I based my broad criteria on subject matter and quality of image.
My goal was to illustrate the breadth and diversity of the collections
while emphasizing various facets of the Institute as an academic
institution, a presence in the local community, and an important
influence in the fields of art, architecture, fashion, design, and
education. Our archival image collection existed in a variety of
formats, including photographs, slides, and negatives, and focused
primarily on such aspects as the architecture and grounds of the
campus and the Clinton Hill section of Brooklyn in which it is located;
student work in art, architecture, fashion, and design; and campus life
and activities. I selected a number of historical photographs of Pratt's
buildings, some of which date from the nineteenth century and are
well known for their architectural features.
The campus, a twenty-five-acre oasis in the historic district of
Clinton Hill in northern Brooklyn, was also represented, in part
because early images of the campus afford views of surrounding
Brooklyn streets and include long-demolished buildings and
structures. There were also photographs of classes in session and
students at work dating from the early decades of Pratt's history.
These were significant from the standpoint of the history of
education. They depicted the state-of-the-art studios and equipment
used for classes in such fields as drawing, woodworking, engineering,
and physical education. Also important from a socioeconomic
standpoint was the presence of women and minorities in many of
these photographs, demonstrating the Institute's progressive outlook
in an era when racial and gender discrimination were the norm in the
United States. I included images of the founder and his family, as well
as some of the mansions they built, as comprising an important part
of Brooklyn's history. Finally, I chose examples of student work, such
as architectural models and drawings, designs for clothing and
interior decoration, sculpture, prints, and posters, which illustrated
the outstanding body of work produced by Pratt students during its
306 Pratt Institute: A Historical Snapshot of Campus and Area
existence. Many of the examples in this category came from negatives
dating from the 1950s through 1970s, and all gave evidence of the
creativity fostered by the Institute's faculty.
From Special Collections I selected prints of hand-colored
stenciled fashion plates from La Gazette du Bon Ton, considered the
most influential French fashion magazine of the early twentieth
century, as well as representatives of Pratt's collection of nineteenth-
and twentieth-century bookplates. The fashion plates could serve as
resources for fashion students and designers, while the bookplates
were outstanding examples of period book art and typography.
We were gratified to receive a one-year grant from METRO to
digitize this selection. We first created a timeline incorporating all the
major activities, which allowed us to track our progress and, at the
mid-point, enabled us to make the adjustments necessary to complete
it in time (Figure PRAT-i).
Imaging took place on an iMacG5 (20-inch, 1.8 GHz, 1 GB RAM,
250 GB hard drive), an Epson Expression 10000XL scanner, and
Adobe Photoshop Creative Suite software, all made possible through
the METRO grant. The raw and enhanced TIFF files were saved to
Gold Mitsui CD-Rs and a 1.6 TB LaCie external hard drive (the LaCie
drive not only allowed secondary storage and backup but at times
became a shared drive for the various computers used for this
project). The resolution had to be determined by the format: the
photographs, bookplates, and fashion plates were scanned at 600 dpi,
whereas the negatives were scanned at 1,200 dpi.
Once we received the grant, and while the initial scanning was
taking place, the project manager, scanning technician, and I formed a
search committee to recruit a digital archive cataloger. This person's
ability to capture the descriptive and technical metadata would be a
crucial element in the success of this project, so we insisted that
candidates possess experience and knowledge of MARC21, AACR2,
Library of Congress Subject Headings, Dublin Core, and emerging
descriptive metadata standards.
Pratt Institute: A Historical Snapshot of Campus and Area 307
Promotions
Website
Design
Records
Catalogued
Images
Scanned/
Edited
0
0
02/05
to
8
| 03/05
OJ
0
0
1 04/05
o\
0
&
0
0
05/05
LA
O
LA
0
0
06/05
UJ
O
O
Edit
07/05
Start
Design
O
O
Finish
08/05 !
Refine
Design
LA
O
O
09/05
Create
Create
Site
Edit
10/05
Create
Launch
Finish
1 1/05
Send
12/05
s
5"
m
(X
3
Promotions
Website
Design
Records
Catalogued
Images
Scanned/
Edited
0
0
02/05
0
0
0
03/05
00
0
Ui
0
0
04/05
Begin
Plans
o\
0
0
0
05/05
Start
Design
0
LA
O
O
06/05
Refine
Design
u>
to
0
Edit
07/05
Create
Site
*.
0
0
Finish
08/05
Create
Site
00
0
09/05
Create
Create
Site
Edit
10/05
Create
Launch
Finish
11/05
Send
12/05
Figure PRAT-1 . Schedule for the 2005 Grant.
308 Pratt Institute: A Historical Snapshot of Campus and Area
We also expected knowledge of authority control in the online
environment, as well as experience using bibliographic utilities and
assigning metadata to digitized materials for Web-based display and
searching. During the interviews, in addition to asking the usual
questions, we gave the candidates several images and asked for on-
the-spot descriptions; this was an effective means (with considerably
varied results) of determining the candidates' observation powers and
the kind of descriptive information they would be likely to provide.
The search committee's diligence was rewarded with a candidate who
produced outstanding work.
Descriptive metadata was to follow the Dublin Core elements of
title, creator, subject, description, date digital, date original, format,
digitization specifications, resource identifier, rights management,
contributing institution, publisher, contributor, type, source,
language, relation, and coverage. Technical metadata was to include
resolution, size in Kilobytes, height and width in pixels, bit depth, and
color space. Library of Congress Subject Headings, as well as LC's
Thesaurus for Graphic Materials, were to provide the controlled
vocabulary for cataloging content.
Challenges
Scheduling
Despite our careful planning, we encountered a variety of
challenges, several of which affected our schedule and required
adjustments. Perhaps the greatest problem was maintaining a
consistent workflow. All the team members except for the cataloger
took on the project in addition to their usual responsibilities, which
impeded the conducting of regular meetings and delayed weekly or
monthly goals. Because we were spread rather thin at times, it was
difficult to maintain good communication within the group. We all
recognized how quickly we would lose control if the team were
disjointed, so we made special efforts to touch base regularly and keep
each other informed through e-mail exchanges and periodic meetings.
Thus, when the project director left before the project was completed,
it was relatively easy for me to step in and take her place.
Pratt Institute: A Historical Snapshot of Campus and Area 309
Selection and Scanning
Because we did not focus on a specific theme or collection, I
needed to employ multiple criteria in my selection process, including
artistic quality, historical significance, and research or special-interest
potential, to assemble a collection both unique and engaging on
various levels. I tried to operate within the parameters set forth in
several resources on this issue (Gertz, 2007; Hazen, Horrell & Merrill-
Oldham, 1998; Southeastern NY Library Resources Council
[SENYLRC], 2004), but even so, freedom of choice made it difficult to
adhere to the maximum number of images mandated by METRO
without omitting something I thought especially interesting, and in
fact I had to pare down the number of selections several times. This
was more labor intensive and took up more time than anticipated, and
thus a certain amount of haste was required given the time frame.
Even so, I was late in delivering the final selection. For future projects,
we knew we would have to begin the selection process sooner and
allow more time for its completion. Of course, focusing on one theme
or collection would also ease the problem.
Another important reason for the delay was the fact that only the
negatives had pre-existing unique identification numbers. The
bookplates, fashion plates, and most of the photographshad no
identifiers at all. We had to devise a system for these at the time of
selection. The simple act of numbering all these images, especially for
those in fragile condition, was extraordinarily laborious and time
consuming, even with the aid of student workers, yet it would have
been disastrous not to have done it. Given the time constraints for this
project we had no choice but to plow ahead, but, based on this
experience, I can certainly endorse the view that, whenever possible,
one should avoid digitizing materials lacking a minimal level of
intellectual control.
The project's wide range of materials also made the development
of a systemized workflow and management of the digital results
difficult. The scanning process required a tremendous amount of
patience, especially with the negatives, for which the time needed to
correct the images was considerably longer than for the prints and
310 Pratt Institute: A Historical Snapshot of Campus and Area
plates. The negatives, being film, showed dust and scratches much
more prominently than the other formats, especially at the high
resolutions we were using. Our Epson scanner had no Digital Image
Correction and Enhancement (ICE) functionality for filtering, and we
chose not to use the dust-removal feature native to the Epson scanner
because we feared it was insufficiently robust. That meant depending
on the tedious and time-consuming process of editing out dust and
scratches by hand via Photoshop's clone stamp tool. In retrospect, it
might have been better for us to use Epson's dust-removal feature-
sophisticated or not— for the sake of saving much-needed time.
The bookplates in particular presented problems regarding
accuracy of tone and color reproduction. Following the National
Archives and Records Administration's recommendations in
Guidelines for Digitizing Archival Materials for Electronic Access
(Puglia, Reed & Rhodes, 2004), we included a Kodak Q-13 gray scale
target with each scan, which improved the accuracy and consistency of
color and tone reproduction. We also used ICC color profiles,
converting the ColorSync Workflow profile (standard for Epson
scanners) to the Adobe RGB 1998 workspace for post-processing in
Adobe Photoshop. We were not entirely satisfied with the results, and
realized that color management software would have been useful for
profiling and calibrating our scanners and monitors. In addition, the
presence of several large windows in the Visual Resources Center
meant that lighting was highly variable and rendered difficult the
comparison of the originals against the digitized images to check
quality and accuracy for color, brightness, and tone. The process was
less consistent and more subjective than we would have preferred, but
that was a drawback of being housed in a building constructed in 1896
with windows intended to provide as much natural illumination as
possible.
Another obstacle was the condition of many of the items,
especially the photographs, some of which dated from the nineteenth
and early-twentieth centuries and suffered from cracks, brittle paper,
peeling emulsions, and other damage. They were housed in archival
polyethylene or Melinex sleeves for protection, which meant the
scanning technician had to remove and replace the images from the
Pratt Institute: A Historical Snapshot of Campus and Area 311
sleeves carefully to avoid inflicting further damage. The technician,
after scanning, made attempts in Photoshop to minimize some of the
visual imperfections that appeared in the scans because of the
condition of the originals. This was a time-consuming process.
However, this project was as much about preservation as it was access
and outreach. Several images were chosen precisely because they
would not last much longer.
One positive result of these problems was the creation of
standards for imaging workflow and various file-management issues
such as naming conventions and folder hierarchy systems. In one way,
the scanning issues we had to confront were beneficial (albeit
annoying), because they provided us with a solid foundation of
practical experience working with a variety of formats that would
allow us in future projects to anticipate potential problems and enable
us to construct schedules that would accommodate them. They also
gave us a greater understanding of the logistics involved in the
deceptively simple and clear-cut process of "digitization," which
increasingly has become, for the uninitiated, a buzzword for quick-
and-dirty reproduction and document delivery.
Metadata
The nature of our project also contributed challenges for
metadata creation. The diverse nature of the items required the
cataloger to quickly gain expertise in cataloging bookplates, fashion
plates, and historical photographs and negatives. Research performed
in one subject area did not usually apply to another; for instance,
knowledge of the history of Pratt Institute, necessary for the prints
and negatives, had no bearing on the fashion plates and bookplates.
As Pratt's archivist, I was the main source of information for questions
about the historical images, but researching the bookplates and
fashion plates was a bit more complicated. The cataloger had to
consult resources at Columbia University to obtain background
information on the bookplates, their creators, and the historical
techniques used to create them.
The cataloger had to use a slightly different approach for each
type of material. The format differences across collections were a
312 Pratt Institute: A Historical Snapshot of Campus and Area
given, but the interpretation of the creator, description, and subject
fields often varied as well. For example, the description field for a
fashion plate with printed text in French required a different format
and style than that for a photograph of an architectural drawing. In
hindsight, it would have been more efficient and cost-effective had we
focused on the metadata for one type of material at a time.
As they did with color reproduction, the bookplates proved
especially problematic. Many of the artists were identified by only a
monogram or other symbol, and a large number were undated, but the
cataloger simply lacked the time to research and identify each plate
more thoroughly. The techniques used to produce the plates (such as
etching or engraving) might also have been determined for many of
the plates had there been additional time or cataloging resources,
though the cataloger was able to do this for some of the plates. There
were issues at the quality-control stage as well. For example, we
initially entered the bookplate owners' names in the format Firstname
Lastname. When we realized that this produced idiosyncratic sorting
in our image presentation software, however, we changed the order to
Lastname, Firstname— a relatively small change in the metadata
template that proved time consuming because it had to be
implemented in each individual record.
We also had to modify the Dublin Core metadata template in
order to provide more effective descriptions of the bookplates' unique
elements. For instance, the person for whom a bookplate was made is
as important as the artist who created it, and the motto, when there is
one, becomes a major means of identification. We had to
accommodate these features by creating the elements Owner and
Motto, and we changed the Creator field to Artist because we felt
many people might misinterpret Creator to refer to the printer of the
physical object rather than the artist who designed the plate.
Additional modifications were also made (Figure PRAT-2).
Pratt Institute: A Historical Snapshot of Campus and Area 313
Simple Dublin Core Fields
Modified Dublin Core Fields
for Bookplates
Identifier
Identifier
Source
Source
Date
Date Original
Date Digital
Creator
Artist
Title
Owner
Coverage
Place
Description
Description
Subject
Subject
Language
Language
Format
Format
Type
Type
Rights
Rights
Contributor
Contri bating Institution
Publisher
Relation
Motto
Keywords
Digitization Specifications
Digitizer
Cataloguer
Figure PRAT- 2. Basic and Modified Dublin Core Metadata.
Access to Our Digital Collection
Although many of the grant recipients for METRO'S 2005
initiative used CONTENTdm as their image management and
presentation software, Pratt had made a decision, prior to and
independent of the grant, to purchase Luna Insight software, which
was used by the Getty Museum and other important art institutions.
The reasons for this had to do with numerous organizational and
storage features deemed necessary for the needs of our faculty and
students: unlike some institutions, Pratt's primary and overarching
digitization concerns had to focus on the continuing research and
pedagogical needs of art, design, and architecture programs at both
the undergraduate and graduate levels. This decision was not without
314 Pratt Institute: A Historical Snapshot of Campus and Area
consequences when it came to creating access to our METRO
collection.
To begin with, implementation of Luna Insight into the browser
environment was a slow, difficult process; a significant learning curve
was required to resolve issues surrounding Insight's interface and
layout, which was new to everyone. Our greatest difficulty, however,
was making the collections OAI-harvestable. Our systems librarian
joined us at this point to provide assistance, and he discovered that
Luna Insight's databases were not fully compliant because they were
unable to allow selective harvesting based on a specific date stamp-
that is, they could harvest repository files that were static but not
dynamic. Even achieving this level of compliancy was frustrating and
time consuming because OAI harvesting was naturally not Luna's
responsibility (in fact, Luna's support staff told us that Pratt was its
first client utilizing OAI harvesting for Insight collections). Therefore,
it became our sole responsibility to generate the files, intermediate
them with a gateway, and register them with a harvester, although no
member of the team had done this before.
With some difficulty we were able to generate static repository
files with the Luna Insight software, after which we had the collections
registered and intermediated through the Los Alamos National
Laboratory's (LANL) open-source experimental OAI Static Repository
Gateway software (Srepod). This was not an easy process because our
inexperience with OAI gateway software resulted in a steep learning
curve. Once we had gained sufficient mastery of the software,
however, we chose to register our collections with OAIster, though
here, too, we encountered difficulty, because our attempts were met
with persistent error messages. Eventually we had to contact a
member of the OAIster team at the University of Michigan, who was
able to assure us that our collection had been registered successfully.
What made the OAI harvesting process so challenging was the
fact that it comprised three distinct steps and was supported by three
independent systems: Luna Insight, Srepod, and OAIster. Luna was
unable to help us with Srepod and OAIster, Srepod was unable to help
us with OAIster, and OAIster was unable to help us with Srepod, yet
Pratt Institute: A Historical Snapshot of Campus and Area 315
the three steps of the process had to mesh with one another in order to
achieve success. Moreover, as open-source systems, Srepod and
OAIster understandably could not provide extensive personalized
technical support.
We also encountered compatibility issues because, as mentioned
above, we modified the Dublin Core template when cataloging the
bookplates. During the harvesting process, we discovered that OAI
gateways and harvesters require strict compliance of metadata with
the simple Dublin Core schema (i.e., the basic fifteen fields). We were
able to solve this problem by retaining our specialized, bookplate-
specific fields while adding the standard fields we had previously
excluded. We thus ended up with two metadata templates— the
specialized fields and the standard Dublin Core fields— in the same
record, with duplicated values in some of the fields (for example, the
values in our specialized Artist field were duplicated exactly in the
Dublin Core Creator field). In our Luna Insight presentation, only the
specialized fields were displayed; when we harvested for OAI, only the
Dublin Core fields were harvested.
Perhaps no other problems encountered during this project were
as perplexing and complicated as those surrounding OAI compliancy.
Confronting these issues provided us with a greater understanding
and appreciation of the complexity of the process, but it also brought
home the fact that, without considerable personalized technical
support, it was difficult to fulfill an important requirement of the
grant initiative. We also understood that it would be unfair to expect
any system to be able to provide that level of guidance. It was very
much a learning experience, and our systems librarian documented
each step of the process, from the generation of the static repository
files with the Luna Insight software to the final registration of the
collection with OAIster.
2006 and Beyond
As we approached the completion of the 2005 project, our director
urged us to apply for another grant in 2006 to maintain our
momentum. This time, we focused on only one theme. One advantage
316 Pratt Institute: A Historical Snapshot of Campus and Area
of our previous metadata research was the discovery that our
bookplate collection included work by several important American
bookplate artists, and we also recognized the general research value of
bookplates for librarians, antiquarian booksellers, and genealogists, as
well as their beauty, creativity, and technical achievements. Therefore,
we decided to digitize the remaining 1,100 plates in our ex libris
collection, anticipating that the digital dissemination of the plates
would serve not only the Pratt community but also the increasing
number of individuals and organizations, both national and
international, dedicated to the study of ex libris. When METRO
awarded us our second grant, we were delighted with the opportunity
to digitize an entire collection— and one, moreover, that had lain
hidden in Special Collections since the late 1970s. For almost three
decades no one had known about or been able to access these
bookplates; now, anyone with Internet access would be able to.
Although several individuals on the 2005 team had left, a few
remained and, with me as project manager, formed the nucleus of the
2006 group. Despite the personnel changes, however, this project
proceeded more smoothly than the previous one, partly because we
applied the lessons learned from the 2005 project and partly because
we were dealing with only one collection (for which we had already
worked out metadata issues) and therefore did not have to cope with
multiple formats and approaches. Despite our allowing more time to
import the images and metadata, we once again found it necessary to
readjust our timeline because it was not easy to calculate accurately
the amount of time necessary to digitize, enhance, and catalog the
plates, even though we knew in advance how many there were. Also,
we initially thought we could jump start the cataloging by initiating
metadata creation before scanning, but we soon were forced to reverse
the order, because the cataloger needed to zoom in on many images to
determine certain details for the metadata (for instance, whether the
plates were etchings or engravings). In addition, some of the scanning
problems described for 2005 had not been completely corrected at the
time we began the 2006 project, and so it required additional time to
resolve them finally. Fortunately, thanks to the cushion we had built
Pratt Institute: A Historical Snapshot of Campus and Area 317
into our original timeline, the schedule modifications did not affect
the completion of the project (Figure PRAT- 3).
Once our online collections went live, we publicized them
throughout the Institute and held an open house to provide
instruction and hands-on demonstrations.
Promotion
Content
Import
s
Scanning/
Imaging
Preparation
REVISED DIGITIZATION PROJECT TIMELINE
90/10
03/06
04/06
05/06
S>
-=■
O
CD
c5
ON
90/01
err
T>
J
Promotion
0 s=
S. S-
Scanning/
Imaging
Metadata
Preparation
ORIGINAL DIGITIZATION PROJECT TIMELINE
3N
=
KJ
3N
i5
3N
04/06
Si
IN
CD
=N
90/i0
g
09/06
10/06
Figure PRAT-3. Schedule for the 2006 Grant.
318 Pratt Institute: A Historical Snapshot of Campus and Area
We also notified various American and international
organizations dedicated to the study of bookplates, such as the
International Federation of Ex Libris Societies (FISAE), because we
anticipated (correctly) that our collection would attract considerable
long-distance interest, and hoped that scholars more knowledgeable
than we would contact us with additions and corrections to our
metadata. Several sites posted our announcement, and we were
especially pleased to hear from the Australian Bookplate Society,
which noticed that Australia was not represented in our collection and
actually offered us a donation to fill that gap!
In 2008 the Pratt Institute Libraries chose to become a
participating institution in ARTstor, which meant, for us,
discontinuing Luna Insight. Our art, design, and architecture images
would be hosted as a local collection through ARTstor, but that was
not an appropriate repository for those from Special Collections and
the archives because of their unique and (in the case of the
photographs and negatives) proprietary nature. The limitations on our
financial resources meant that whatever image management software
we selected would have to be open source, but time constraints
prevented us from thoroughly examining the variety of available
systems, so our Visual Resources Curator suggested we post them on
Flickr. This turned out to be an excellent idea, because we very quickly
saw a marked increase in visitors, some of whom supplied information
that, for our bookplates, allowed us to expand or correct our
cataloging, which was precisely what we had been hoping for. The
collection logged over 300,000 views in the first nine months on
Flickr, and not a single image has been viewed less than nineteen
times! Clearly our presence on Flickr allowed individuals to discover
us serendipitously in a way not possible when they were required to
navigate our Website.
I noticed, too, that staff from the Institute's administrative
branches, such as the Public Relations and Communications Office
and the Office of Alumni Relations and Annual Giving, more
frequently browsed the collections before making requests for
historical images, with the result that they knew exactly what they
wanted and were even able to provide us with source and identifier
Pratt Institute: A Historical Snapshot of Campus and Area 319
numbers. This has increased the efficiency of our workflow and
document-delivery capability enormously.
The only drawback with Flickr is its inability to support OAI-
compliancy. We are hoping to acquire an open-source image
management system that will correct this. Meanwhile, we continue to
digitize archival images as needed, and have even begun digitizing
important historical publications frequently used by researchers,
which we are planning to post on the Archives page of the Libraries'
Website. There is no question but that we have experienced a sea
change in our digitization efforts as a result of our two grants, and our
story vividly illustrates not only the old maxim that success breeds
success, but also that long-term impetus can be generated from a
single opportunity.
Acknowledgments: The success of our projects was achieved only
because of the dedication and expertise of the members of the project
teams, whose members included Amanda Schriber, Brian Cross,
Lorraine Smith, Vernon Bigman, Stephen Klein, and Michael Nolasco.
I am particularly grateful to Lorraine Smith, until recently the Visual
Resources Curator at the Pratt Institute Libraries, who was our image
cataloger for the 2005 project and supervised the digitization process
in 2006 as Acting Visual Resources Curator. She was responsible for
coping with many of the technical issues described above and
reviewed this chapter to make sure my explanations were accurate.
References
Bibliographic Center for Research [BCR]. (2008). BCR's CDP digital
imaging best practices, version 2.0. Retrieved from
http://bcr.org/dps/cdp/best/digital-imaging-bp.pdf
Gertz, J. (2007). Preservation and Selection for Digitization
[Northeast Document Conservation Center Preservation Leaflet
No. 6.6]. Retrieved from
http://www.nedcc.org/resources/leaflets/6Reformatting/06Preser
vationAndSelection.php
320 Pratt Institute: A Historical Snapshot of Campus and Area
Hagen, D., Horrell, J. & Merrill-Oldham, J. (1998). Selecting research
collections for digitization. Retrieved from
http : / / www. clir.org/ pubs/ reports /hagen/ pub74.html
NC ECHO. (2007). NCECHO guidelines for digitization 2007 (Rev.
ed.). Retrieved from http://www.ncecho.org/dig/
digguidelines . shtml
Puglia, S., Reed, J. & Rhodes, E. (2004). NARA technical guidelines
for digitizing archival materials for electronic access: Creation of
production master files— raster images. Retrieved from
http://www.archives.gov/preservation/technical/guidelines.pdf
Southeastern NY Library Resources Council [SENYLRC]. (2004).
General collection criteria guidelines for HRVH. Retrieved from
http:/ /www.hrvh.org/ about/ selectioncriteria.htm
Scaling Back for an "Experimental"
Collection
Mark F. Anderson (The University of Iowa Libraries)
Abstract
Digital Library Services (DLS) at the University of Iowa Libraries has
progressively worked toward coordinating more large-scale
digitization projects both within the libraries and across campus,
moving away from model of web exhibits that were often created
before the department was formed in 2005. However, a variety of
situations still call for small-scale projects. This chapter, describing
the design and production of the "W9XK Experimental Television
Digital Collection", shows that small-scale digitization projects can
bridge that gap, and yield collections that rise above the level of web
exhibits in their usefulness to scholars and the general public by
limiting exclusive selection and promoting comprehensiveness. While
mirroring this approach of mass-digitization, digital librarians can
also use curatorial decisions and software functionality to further
assist users of these small-scale collections.
Keywords: Archives, Digital collections, Digital libraries,
Digitization, Web exhibits.
Introduction
Henry Wadsworth Longfellow wrote, "Most people would succeed in
small things if they were not troubled with great ambitions." Even if
the poet was not referring specifically to digital initiatives in libraries,
it's an appropriate sentiment at a time when so much institutional
321
322 Scaling Back for an "Experimental" Collection
effort is directed toward mass digitization projects such as the
agreement between Google and the Committee on Institutional
Cooperation (the consortium to which The University of Iowa
belongs) to digitize no less than 10 million volumes from among
members' collections (Committee on Institutional Cooperation, n.d.).
At the same time, it has become common for institutions, or units
within institutions, to work on building small, narrowly-focused
collections as an initial foray into digitization, while securing funding
for the necessary resources to ramp up to large-scale scanning
projects and mass digitization, or to expose exceptional materials.
This chapter will discuss one of these small digital collections, but will
begin with the development of the digital library at the University of
Iowa to which it belongs.
Digital Initiatives at the University of Iowa
The earliest digital collections hosted by The University of Iowa
Libraries date back to the mid-1990s, and were a combination of small
web exhibits and larger efforts to digitize materials from flagship
collections. The web exhibits were considered "virtual versions" of
exhibits displayed in the library's exhibit hall and which combined
highlight images with essays and other supporting text provided by
library staff and campus scholars (University Libraries. University of
Iowa, 1999). In 1997, the Libraries were awarded Library of
Congress/Ameritech National Digital Library Competition funds,
which supported the digitization of nearly 8,000 talent brochures
from the papers of the Redpath Lyceum Bureau, an agency that
represented performers on the Midwest Chautauqua circuit. In many
ways, this was the Libraries' first mass-digitization project (The
Library of Congress, 2003). Other scholarly digital collections created
at the time included the Center for Electronic Resources in African
Studies and the International Dada Archive.
In 2005, a new department, Digital Library Services (DLS), was
formed to manage digital initiatives. Since that time, DLS has
progressively worked toward coordinating more large-scale, "left-to-
right" projects both inside the Libraries as well as with faculty and
Scaling Back for an "Experimental" Collection 323
scholars on campus, moving away from the highly selective model of
the early web exhibits, which tended to contextualize the digital
materials for the user. Conversely, the model of comprehensive
digitization aims to allow users the freedom to repurpose digitized
materials for the creation of new knowledge and insight. This requires
not only a thorough approach to digitization on the part of the library,
but also the availability of tools to facilitate new uses for the items,
especially tools that link, overlay, and share data. Unfortunately, many
of these tools require extensive development and are not yet available
as turn-key solutions.
Situations such as the anniversary of an important institutional
event or a specific user-driven request have still led libraries to
undertake small- and medium-sized digitization projects. The
challenge for libraries is to create smaller digital collections that are
valuable for scholarship. Small-scale digital collections may provide
the same research potential as the entire physical collections from
which they are drawn.
Within the University of Iowa Libraries, DLS has cultivated a
particularly close relationship with three content providers: Special
Collections, Iowa Women's Archives, and University Archives. With
the first two units, digitization projects are becoming more large-scale
and left-to-right in their approach, but less so with the latter. This is
mostly due to that fact that University Archives subscribes to the
"principle of provenance", the archival practice of organizing
collections by creator rather than subject, and collections are often
selected for digitization based on a topic of interest rather than
contributing entity, making it difficult to consider left-to-right
scanning. For physical exhibits of University Archives materials, this
has long led to "artificial collections" that pull from many different
record groups, according to University of Iowa Archivist David
McCartney. More recent digital collections from the archives are no
exception. (D. McCartney, personal communication, October 27,
2009). This kind of intermediation by the selector is a major
difference between the small-scale "boutique" digitization model and
mass-digitization.
324 Scaling Back for an "Experimental" Collection
In "Shifting Gears: Gearing Up to Get Into the Flow," the authors
direct libraries to stop obsessing about items and making curatorial
decisions about what to digitize since the selection has already been
done (Erway, R., and Schaffner J., 2007). Arguing that if items were
important enough to acquire originally, they are important enough to
digitize, the authors advocate a more programmatic approach of
digitizing items as they are accessioned, rather than thinking of
digitization in terms of discrete projects. Is it possible to work within
these provocative guidelines and still develop small-scale digital
collections that are "in the flow"? Yes. For example, libraries can push
the decision making process from the item level to the collection level.
After all, with so much catching up to do, there's no way all existing
collections can all be scanned at once. Small collections can be
selected for digitization, and worked through left-to-right, even if that
means scanning hand-written grocery lists on backs of dry-cleaning
receipts (a real-world example from an Iowa Digital Library
collection). In this way, DLS has operated with the understanding that
a digital collection, even a small one, should be a comprehensive
surrogate of a physical collection, regardless of size or scope, which
attracts not only general interest, but promotes new scholarly
activities as well. Also, for text items, the combination of brief
metadata and optical character recognition (OCR) can provide access
without the need of extensive, handcrafted records.
Selecting the Collection
At a time when video is so easily accessible through websites Hulu,
Netflix, Boxee.tv, and YouTube as well as thousands of cable and
satellite television stations, it's hard to imagine the technical and
organizational challenges facing engineers and technologists involved
in the infancy of television. Were these pioneers of a century ago
similarly considering the scale of their new experimental medium? If
so, then the small digital collection described in this chapter is
particularly appropriate: "W9XK Experimental Television at Iowa"
(http:/ / digital.lib.ui0wa.edu/w9xk).
Scaling Back for an "Experimental" Collection 325
February 2009 was seen as an excellent opportunity to promote
the history of the W9XK experimental television station on the
University of Iowa campus by means of digitizing related archival
materials, both because of the congressionally-mandated conversion
to digital broadcast television, which was set to take place that month,
and the 75th anniversary of the station the year before. W9XK, the
first educational TV station in the U.S. went on the air January 25,
1933> with a weekly or twice-weekly schedule of lectures, music, and
drama. Reception was reported back to the university from as far away
as Oklahoma, Kentucky and New York (University Libraries.
University of Iowa, 2009). Appropriate materials were gathered from
throughout the archives including correspondence and newspaper
clippings from faculty and electrical engineering department subject
vertical files, still photographs from a prominent campus
photographer, articles from a campus engineering journal, WPA
federal writers project papers, and a student thesis.
With no single collection to process in a left-to-right manner, this
project is a departure from DLS's more recent production model
described earlier, and runs somewhat contrary to the challenges in
"Shifting Up," but it is unlikely that these materials would be scanned
together in any other way short of a heroic effort to mass-digitize the
entire University Archives. The collection is small, containing just
over 50 digital objects (although some of the objects are composed of
several items grouped together, their organization to be discussed
below), but it tells an important story of the station's development and
operation in the broader context of early advances in television
technology, using the variety of formats listed above. Unfortunately,
no video or film footage of \V9XICs broadcasts could be included in
the collection since none is known to exist today.
Building the Collection
From 2005 to 2007, digital initiatives at the University of Iowa
Libraries were completed mostly by staff in DLS in "in-house"
production. Scaling up to support larger digitization efforts has
required production tasks to occur in other library departments with
326 Scaling Back for an "Experimental" Collection
both the staffing and expertise to process not only more materials, but
also address unique handling and cataloging issues that invariably
arise from digitizing entire collections. Metadata is now mostly
applied by cataloging staff, except in cases where collections come to
the library already packaged with existing records, which requires
more in the way of reorganization and standardization rather than
creating new records from scratch, which as a brand new collection, is
the way metadata for the W9XK collection was handled.
In a similar way, the scanning that had been done by staff and
students working directly under DLS has mostly moved to the
Preservation and Reformatting department. However, scanning for
the W9XK collection broke from this workflow model. A staff member
in University Archives was interested in learning about scanning and
image editing, especially the operation of a new top-down book
scanner the Libraries had just purchased. The potentially small size of
the digital collection presented a good opportunity for that. In this
way, small-scale digital collections afford opportunities to deviate
from established workflows for staff to develop new skills. But with
other parts of production spread out, it may not make a considerable
difference in time to do just part of the production "in-house". The
time for an item in the W9XK collection to be completely processed
and added to the digital collection was neither considerably faster nor
slower than other collections.
The University Archivist selected and pulled appropriate
materials from the physical collections listed above and prepared
them for the staff member who scanned them on either an 11x17
flatbed scanner or the newly acquired top-down scanner, which
accommodates items up to 24 x 36 with a book cradle for bound
materials. Scans were then edited and display files derived by DLS
staff who then uploaded the images to CONTENTdm, the digital
library software that the Libraries use for many of its digital
collections. Lastly, cataloging staff added descriptive, administrative
and technical metadata to the objects. Metadata is almost always
applied to the digital items following upload to the digital content
management system in order to take advantage of software
Scaling Back for an "Experimental" Collection 327
functionality that allows for easier batch cataloging processes,
increasing efficiency.
Collection highlights and sample searches were selected for a
collection homepage, and the university archivist provided an
introductory paragraph, which takes the place of the more contextual
and interpretative writing that used to accompany web exhibits
directly alongside images. A lingering concern about small-scale
digital collections is packaging them in beautiful, but deep, web
portals that users must navigate to find the information they seek,
rather than letting digital objects live in a large repository with thin
logical collections built around them. A compromise is to maximize
collection exposure to search engines and aggregators. The Libraries
use Google Sitemaps to help assure collections are crawled and ranked
highly by Google. OAI harvesting of collections is enabled for outside
service providers, and the Libraries also implemented a next-
generation catalog that brings together records from the traditional
catalog as well as online books and journals, local web pages and
digital collections such as W9XK.
One recent change in how DLS delivers large-scale, left-to-right
digital collections is to use folder-level organization of the digital
objects. The digital library software used for most collections in the
Iowa Digital Library, CONTENTdm, uses "compound objects" to
deliver multi-page items, from the two sides of a postcard, pages of a
document, a monograph, or an entire folder's worth of materials
arranged hierarchically. This decreases the number of items for users
to browse through up front, or consider in results sets. While the
W9XK collection did not fit the folder-structure organization model
since its items came from many different archival collections, one of
the digital collection-delivery decisions made was to display
correspondence items (letters between engineering faculty, university
administrators, government authorities, and the public) so that letters
and their replies were in the same compound objects, rather than as
separate objects. One could argue that this mediation is too much
along the lines of a web exhibit, but it was thought to be helpful in
navigating the correspondence. Feedback from users would help to
support whether this is a good decision. In any case, digital librarians
328 Scaling Back for an "Experimental" Collection
often have to make organizational decisions based on software
functionality and with the intention of assisting users and researchers.
Use of the Collection
The W9XK Digital Collection was officially launched on February 17,
2009 through press releases and library blogs. While we must expect
that our users will tend to find our digital collections through the back
doors of search engines and aggregators rather than our finely-crafted
web portals, only 10% of visits have come directly through Google,
while more than half of all users are referred to the collection from
promotion on university and library pages as well as the library
catalog, although the search engine could have sent users to these
local pages first. Over 75% of page views to date occurred during the
first two months, owing much to the initial publicity (Figure IOWA-i).
3000
2500
2000
1500
1000
500
0
Feb. Mar Apr May June July Aug Sept Oct Nov
Figure IOWA-1. W9XK Pageviews by month (2009)
While somewhat discouraging, most early use may have just been
casual interest, and it may take longer for research and scholarly use
to increase. Owing to Lally and Dunfords assessment of the impact of
Wikipedia in directing users to library digital collections, links to the
W9XK collection have recently been added to Wikipedia articles on
experimental television and may have an effect on traffic over time. In
"Using Wikipedia to Extend Digital Collections" (Lally, A. and
Dunford, C, 2007), the authors note that Wikipedia is "a prime
Scaling Back for an "Experimental" Collection 329
candidate for ... efforts at pushing information about the Libraries out
to where users conduct their research," as their server statistics
indicate it is indeed driving a significant amount of traffic to their
collections.
65% of visits to the W9XK collection home page have come from
within the state of Iowa, a higher percentage of local visits than for
two other small-scale digital collections in the Iowa Digital Library,
which were prompted by high use and requests for reproduction from
unique physical collections, especially where appeal went beyond the
local area (Figure IOWA-2).
■ W9XK □ Victor Projectors □ Rand & Leopold Desks
27
9
Figure IOWA-2. Percent collection use from Iowa networks
Two examples of these are the Victor Animatograph pamphlets
(http://digital.lib. uiowa.edu/victor) and the Rand and Leopold Desk
catalogs ((http:// digital.lib.uiowa.edu/desks), both with fewer than
30 items each. The Victor Animatograph Corporation of Davenport,
Iowa made the world's first 16mm motion picture projector, and was a
leader in the area of film technology throughout the 20th century.
People in possession of one of these projectors would often contact the
Libraries requesting copies of the product pamphlets, which contain
detailed schematic diagrams and operating instructions. This
collection was built before resources allowed for left-to-right
digitization, so rather than digitize the entire collection of papers from
the Victor Corporation, just the 21 highly-requested pamphlets were
scanned. Since launching the collection, 20% of visits to the collection
have come from outside the United States, and over 70% of visits to
330 Scaling Back for an "Experimental" Collection
the collection have come from outside the state of Iowa according to
site statistics.
The Rand & Leopold Desk Company was an office furniture
manufacturing firm which operated out of Burlington, Iowa for 102
years before closing in 1990. Furniture collectors had always been
interested in the Libraries' collection of 27 original product catalogs,
and there were many requests for reproduction. After scanning them
and making them available through the Iowa Digital Library, only 7%
of site visits to the digital collection have come from outside the US,
but over 90% have come from outside the state of Iowa. Perhaps more
so than the W9XK collection, these two small collections meet the goal
of facilitating information needs of users who cannot easily travel to
the UI Libraries.
Difficulties in Tying a Bow
Even when original goals are met, small-scale digital collections don't
often stay "done" for long. Scope-creep is a common danger, for even
when items haven't been cherry-picked for a digital collection, the
source collection almost always logically connects with more
collections. In trying to connect users with a broader universe of
resources, libraries are compelled to extend the scope of their digital
collections. In an example of how digital collections never seem to stay
finished, a collection of an artist's papers in the Iowa Women's
Archives were digitized to complement her digitized artwork. When
discussing publicity of the collection, IWA staff noted that they had
found several audio oral histories and also a video made about the
artist. Even when those multimedia materials are digitized and added
to the collection, it's possible that the archive will acquire additional
items that would need to be added to the digital collection in order for
it to remain comprehensive.
Conclusions
Even as mass-digitization projects begin and are sustained at libraries
through funding and organizational commitments to scaled-up
scanning workflows, requests for creating small-scale digital
Scaling Back for an "Experimental" Collection 331
collections will persist. This is because many unique collections,
especially those with preservation issues or spread out in archives will
not be included in mass-digitization workflows. Likewise, the needs of
libraries' clientele and donor relationships do not necessarily coincide
with broad institutional objectives. Responding to these requests can
help strengthen relationships inside and outside of the library and
build support for digitization programs. Even one-off collections
should fit within programmatic guidelines, although in the case of the
W9XK collection, some liberties were taken in its creation. Small-scale
digital collections can be useful to researchers and the general public,
but they have to be accessible. Once found, the collections should be
complete to the point where users can be confident that they have the
entirety of a collection's information at their disposal. Although the
W9XK digital collection required the hand-selection of items, steps
were taken to ensure the fullness that is necessary in today's digital
libraries.
References
Committee on Institutional Cooperation, (n.d.). CIC-Google
agreement. Retrieved from http://www.cic.net/Home/Projects/
Library/BookSearch/CIC-Google.aspx
Erway, R., and Schaffner, J. (2007). Shifting gears: Gearing up to get
into the flow. Retrieved from http://www.oclc.org/ programs/
publications/reports/2007-02.pdf
Lally, A., and Dunford,C. (2007). Using wikipedia to extend digital
collections. D-Lib Magazine, 13 (5/6).
Library of Congress. (2003). Library of Congress and Ameritech
Competition (igg6-iggg): Awards and collections. Retrieved from
http:/ /lcweb4.1oc.gov/ ammem/ award/ collections.html
University Libraries. University of Iowa (1999). Keeping our word /
North Lobby exhibit / University of Iowa. Retrieved from
http : / / www.lib.uiowa.edu / exhibits /keeping/ intro 1 .htm
University Libraries. University of Iowa. (2009). Iowa city town and
campus scenes. Retrieved from http:/ / digital.lib.uiowa.edu/ictcs
332 Scaling Back for an "Experimental" Collection
University Libraries. University of Iowa. (2009). Victor Animatograph
Company Pamphlets. Retrieved from http://digital.lib.uiowa.edu
/victor
University Libraries. University of Iowa. (2009). Map/diagram of
Midwest, 1930s. Retrieved from http://digital.lib.uiowa.edu/
u?/w9xk,67
University Libraries. University of Iowa. (2009). Rand and Leopold
Desk Company Catalogs. Retrieved from
http:/ / digital.lib.uiowa.edu/ desks
From Confusion and Chaos to Clarity and
Hope: Reorganization of Work Flows,
Processes, and Delivery for Digital
Libraries
Jody L. DeRidder (The University of Alabama Libraries)
Abstract
Digitization support within an institution may be fractured across
several departments, only partially funded, and may suffer restraints
imposed by delivery software which seriously hamper progress. Most
digitization is undertaken with little thought for the future; the result
is digital file chaos and confusion. Without clarification of file
identities and relationships, preservation and migration to new
systems are seriously hampered. Additionally, low funding for archival
staff may preclude the creation of valuable item-level metadata. The
University of Alabama Libraries leveraged the expertise available
across the library to build a cross-departmental collaboration with
which to face our challenges, recognizing that obstacles become
opportunities for creative solutions. We are involved in a series of
pilot projects to explore how to address the gap in archivist staffing to
create item-level metadata. This chapter shares our discoveries and
solutions.
Keywords: Cross-departmental collaboration, Digitization, Digital
file organization, Metadata creation, Open source software.
The first few years of most digital library initiatives are marked by
'boutique' collection development, in which the standards,
333
334 From Confusion and Chaos to Clarity and Hope
organization, methodology, metadata, file names, and consistency
vary considerably. At the time of my arrival at the University of
Alabama in mid-2008 as head of the new Digital Services department,
over thirty such digitization projects had been completed. Each
collection had its own file-naming system and metadata fields, with
inconsistencies throughout; nothing was standardized. Metadata in
the delivery software did not retain in any predictable fashion a
reference to the related archival files, and could not be exported in
full. Digital Services staffing was minimal, requiring time from the
Cataloging and Metadata Services for subject headings and upload,
from archival staff for preparation of content and descriptions, and
from Web Services to manage the interface and software support.
The scope of the task ahead was to expand heavily on the
scanning staff and equipment, develop a feasible set of systematic
work flows for supporting a large increase in scanning, build a cross-
departmental team capable of supporting digital library development,
and to create an organized and reusable set of digital content that is
not dependent upon resident knowledge for continuation or
restoration. Challenges included a simultaneous reduction in archivist
work hours, minimal space for expansion, difficult relations between
some departments, and insufficient time available from Web Services.
As in many smaller organizations, our digitization effort is
tremendously dependent upon cross-departmental collaboration.
Programming assistance and web delivery, metadata services,
archivist expertise and a regular influx of well-chosen content are all
critical to the development of our online research collections. A
previous gift from EBSCO Industries to the libraries supports
digitization and the development of technical infrastructure, but it
does not support the processing, arranging, and description of
archival collections. Our need for content creates a demand on the
archivists that they simply do not have the resources to meet.
Recognizing the need for improved cross-departmental
communications and teamwork, our dean (L. A. Pitschmann, personal
communication, August 25, 2008) called together lead representatives
(including two associate deans) from Library Technology, Web
From Confusion and Chaos to Clarity and Hope 335
Services, Collection Development, Cataloging and Metadata Services,
Special Collections and Archives, and Digital Services, to form an
ongoing Digital Programs group which would meet regularly to hash
out problems, develop alternatives, research opportunities and assign
priorities. The creation of this group was a stroke of brilliance. By
forming this framework for participation, setting forth a strategic goal
and providing clear administrative support, our dean laid the
groundwork for success. Given our multiple operational and relational
challenges, we could only succeed by seeking solutions with the
assistance of all impacted parties.
Against this backdrop we are working through four major
problems: digital file chaos, the inability to reunite metadata with the
archival content, software restrictions on the number of collections,
and a lack of archivist time to create item-level metadata.
Problems and Solutions
Digital File Chaos
Managed, efficient production and expanded growth of
collections requires standardization, not only of work flows and
procedures, but also of storage and file naming conventions. Delivery
systems become outmoded; migration into replacement software
requires consistency of legacy content. Consistency of file names and
storage patterns can also support cost-saving automation.
As mentioned, we already had over thirty digitized collections,
each with different file naming systems and metadata in various states
of disarray across a completely disorganized file system. What little
documentation existed was scattered. Collections had been digitized
for years with no road map, and with no concern beyond getting the
content onto the web. We needed a clear methodology for file
organization.
I consulted with the archivists to gain a greater awareness of the
scope of current and future digital content. After much debate, we
determined that it was most important for us to store content in such
a way that we could retrace the material to the archival analog
336 From Confusion and Chaos to Clarity and Hope
collection. Digital collections are transitory and overlapping by their
very nature; we decided that the perception of a digital collection must
be determined by metadata, not by origin (searching on a shared value
can retrieve all components of a digital collection). Together we
developed a file naming scheme to encompass all our holdings and
projected digitization plans for the next few years. We created
hierarchical levels of organization: first by holder, then by collection
within those holdings, then by item within that collection, and finally
by sequence for delivery (Figure ALAB-i). Each hierarchical level is
concatenated with an underscore in the file name, so that provenance
and location, as well as sequence for delivery, are automatable and
clear.
| HolderlD: u0003
["""J Collection !D: 0000023
item ID: 0000007
n Sequence ID: 0005
Archival File: u0003_0000023_0000007_00005. tif
Figure ALAB-1 . [University of Alabama Libraries Digital File Naming Scheme. (©2009,
University of Alabama Libraries. Used with permission.)]
A "holder" is construed loosely, as we could not incorporate all
the variant hierarchies of our organization feasibly into a file name.
Supporting metadata clarifies identity and relationships. It was
important for us to prefix each identifier with a letter so it can be used
as an ID attribute within XML files (W3C, 2005). The n groupings are
non-profit agencies, the p is for patron holdings, and u is for content
from holdings areas within our university. We grouped content by
format, as each format requires different handling and metadata
description. For example, uoooi is the still image collections from the
From Confusion and Chaos to Clarity and Hope 337
Hoole Special Collections; U0002 is rare books, 110003 is manuscript
collections, and so forth. Any of these holdings areas may have
multiple collections.
For the collection number, which composes the second segment
of an identifier (concatenated with an underscore separator), we
echoed the existing collection numbering system whenever possible.
Thus, MS 1980 will be the collection number ending in 1980, for
example.
As an example, uooo3_oooo252 is the identifier for the digitized
manuscript content of the collection known to our archivists as MS
252. Items and their subsidiary pages, however, are numbered
sequentially. Item numbers follow the collection number, again,
concatenated after an underscore. For example, the fifth digitized
letter in the MS 252 collection would be identified by
uooo3_oooo252_oooooo5. If it's a multi-page item, there will be a
fourth set of numbers here, one per page, to identify the sequence for
delivery. The image for page 4 of letter 5 would be identified by
U0003_0000252_0000005_0004.
Thus, every part of every object has a defining and unique
identifier which relates it to every other file in some fashion. We
struggled with all the different anomalies we currently had and could
foresee, simplifying this identification system as much as possible
while still retaining the generic flexibility to apply it to all of our
digitized content, regardless of the form or treatment. We then
expended a great deal of effort to bring all legacy content into the new
file organization scheme, gathering what information was still
available to make sense of the chaos of files that remained from
previous projects. Because of this system of file naming and
organization, we are now able to automate much of our quality
control. We have scripts that locate missing sequences, incorrectly
named files, or files in the wrong place. We even have scripts to repair
file names when large quantities of file names are in error. This has
already saved us hundreds of hours.
Given that each level of the organization scheme contains or
relates information which applies to each level below it, our associate
338 From Confusion and Chaos to Clarity and Hope
dean (Thomas C. Wilson) decided it made sense to echo this file
naming scheme in the directory organization for storing files.
Everything from a particular holder should be together in a directory
named for that holder. Everything from a particular collection should
be in a directory named for that collection. Thus, in our storage
directories, all of the Hoole manuscript collections will be within the
same directory (UOO03). This is an intuitive use of the file directories
to provide clarity and simplicity of organization.
For example, within the UOO03 directory is a subdirectory
0000159, which contains information about all digitized content from
MS 159. Within this collection directory exists a subdirectory for each
item, named for the item number. The storage directories thus echo
the file naming scheme, providing clear, simple, automatable
organization. Drilling down through the file system to the logical
depth locates the digitized archival file (Figure ALBA-2).
Metadata and documentation are stored at the levels to which
they apply. Thus, metadata about the collection is in the Metadata
folder at the collection level, metadata about an item at the item level,
and metadata about a specific page is at the page level. Each sub-file
inherits the information available at the levels above it. Thus,
provenance documentation added at the collection or holder level
clearly applies to all files in the directories below it. If some
information only applies to page 4 of a letter, it is stored in that file's
directory.
An organizational patterning such as this (Figure AIAB-3) retains
the item structure, both physically and nominally, through the file
identifiers. The simplicity, systematic numbering, sequencing, and
clear documentation stored at the level applicable makes the digital
content resurrectable for future delivery systems, without complex
metadata schemes or database dependence. The organizational
scheme is built to be scalable and extensible enough to manage digital
content into the foreseeable future. In addition, because the directory
structure echoes the file names, we were able to automate the storage
of content and the creation of attendant LOCKSS (LOCKSS, 2008)
manifests.
From Confusion and Chaos to Clarity and Hope 339
Figure ALAB-2. University of Alabama Libraries LOCKSS Content Organization. (©
2009, University of Alabama Libraries. Used with permission.)
340 From Confusion and Chaos to Clarity and Hope
V 0 U0003
T 0 0000001
0 0000001
^ 0 0000002
T 0 0000003
^ 0 0001
▼ 0 0002
T 0 001
^ 0 Metadata
i u0003_0000001_OC00003_0002_001.tif
▼ 0 ooz
» 0 Metadata
1 u0O03_Q0Q0OOl_Q0OQOQ3_O0O2_0Q2.tif
Figure ALAB-3. University of Alabama Libraries Digital File Naming Scheme: Sub-
page numbering. (© 2009, University of Alabama Libraries. Used with permission.)
The manifest pages (Figure ALAB-4) link all files to be harvested
for duplication across the Alabama Digital Preservation Network
(Network of Alabama Academic Libraries, 2009), which is to date the
lowest-cost model of digital preservation.
To secure our content further, we capture MD5 checksums upon
deposit in our storage system, and verify them weekly prior to full-
tape backups. This enables us to ensure that the original file is
unaltered over time, and can be restored from a good backup should
any corruption occur.
Reuniting Metadata with Archival Content
Software systems come and go, and successful transition between
them is dependent upon standardized and coherent content and
metadata. Most of our metadata had been altered or added after
upload of content into our delivery system, and did not exist with the
archival files. Upon examination of the exports from our software
delivery system, we found that no single export contained all the
metadata for a given archival object. Even in the most complete export
option, there was no page-level metadata apart from the title and
transcript, and archival file references were sometimes altered, often
missing, and always contained reference to the upload directories,
which no longer existed.
From Confusion and Chaos to Clarity and Hope 341
Tom S. Bird song paper
u0003_0000159 Manifest Page
Administrative Information
• u0003 0000159.vl.xml
Collection Level Metadata
• u0003 0000159.vl.txt
Content
• u0003 0000159 0000001 0001.tif
• u0003 0000159 0000001 0002.tif
• u0003 0000159 0000001 0003.tif
• u0003 0000159 0000001 0004.tif
• u0003 0000159 0000001 0005.tif
• u0003 0000159 0000001 0006.tif
• u0003 0000159 0000001 0007.tif
• u0003 0000159 0000001 000S.tif
■ U0003 0000159 0000001 0009.tif
• u0003 0000159 0000001 0010.tif
• u0003 0000159 0000001 OOll.tif
• u0003 0000159 0000001 0012.tif
• u0003 0000159 0000001 0013.tif
• u0003 0000159 0000001 0014.tif
LOCKSS system has permission to collect, preserve, and serve this
Archival Unit
Figure ALAB-4 University of Alabama Libraries LOCKSS Manifest example. (© 2009,
University of Alabama Libraries. Used with permission.)
In the metadata, if indeed there was an identifier, it was stored in
multiple different fields in different forms. Sometimes there was
nothing at all to indicate what the original file name had been.
342 From Confusion and Chaos to Clarity and Hope
We studied the file naming schemes used in all the 30-some
existing collections, working out how to rename the files to retain the
ordering of delivery, the intended organization of complex files, and
the relationships between related items. Analyzing the metadata in the
CONTENTdm (OCLC, 2009a) database, we matched up what
information we could locate with what little had been retained with
the archival files, and slowly filled in the gaps. Whatever sorting and
matching could not be scripted was done by hand, often requiring
research and investigation.
During this time we explored the use of 7train (Fogel & Hetzner,
n.d.) for transforming exported CONTENTdm Standard XML
metadata into METS (Library of Congress, 2009b) files, which
reordered and clarified the relationships between complex objects.
Based on grain's method of using the first Dublin Core (DCMI, 2009)
identifier field in the export as the file name for the resultant METS
file, we selected this field for our file identifier, and tagged it the same
in every collection. Our metadata librarian (Mary Alexander) has
worked hard to remediate the metadata, entering the correct unique
file identifier in the specified location in all records. Only by repairing
the descriptive metadata to consistently reference the correct
identifier, can we match our exported metadata to the appropriate
archival files.
The benefits of using this process are that the Dublin Core
metadata assigned to the object at the top level is retained, the
organization of the complex digital object is retained, and the
transcripts or Optical Character Recognition (OCR) content are
included in the resultant METS file. However, what is lost includes
any metadata which does not map to Dublin Core, value-added labels
which clarify the content in the fields, and page-level metadata beyond
the title and transcript. If no archival file was used in upload, it is not
referenced in the METS; those referenced may have their file name
altered, and always refer to the upload directory. If the location of this
directory is not corrected after upload to reference the actual location
of the archival file, this is useless for reuniting metadata and archival
content.
From Confusion and Chaos to Clarity and Hope 343
This version of METS was designed for web delivery, not for
preservation. California Digital Library created 7train to enable
repositories in the state that were using CONTENTdm to participate
in their state-wide digital federated search service (Fogel, 2006),
which requires METS (California Digital Library, 2009). The METS
file contains links into the CONTENTdm software for access to the
thumbnails and service derivatives. For preservation, these links will
be useless, as delivery systems change rapidly, if indeed the content is
still online. Additionally, the /train METS contains no technical or
preservation metadata, as none was created by, or exported from,
CONTENTdm.
Seeking to make our METS files more useful, we determined how
to add technical metadata, and analyzed the database structure and
storage directories to identify the actual location and name of
thumbnails and service images. Scripting to replace the derivative
links and archival file reference with full path links to the actual files
proved to be more trouble than it was worth.
However, as our storage system began taking shape, another
option emerged. Our storage structure reflects the compound file
structure, creating an unambiguous arrangement which will survive
any delivery software. A digital archivist of the future should have no
trouble reconstructing our content. The METS file itself may be
redundant. Rather than altering the 7Train METS file to meet our
preservation needs, we decided instead to leverage our organizational
scheme to meet the challenge. All we really need to do is to create the
technical metadata, name it correctly, and drop it in the right
directory (Figure ALAB-5). Then we will compile the metadata and
content links for an item via script into a preservation-ready METS
file for long-term storage.
344 From Confusion and Chaos to Clarity and Hope
T 13 U0003
▼ 3
0000001
P 0000001
T
3 0000002
T
0
0001
T
[3 Metadata
1S] u0003_0000001_0000002_0001.mix.xml
[Sj uOOO _ _O0O0OOl_O0OOOO2_O00 1 . a c r.txt
S u0O0__O0O0OOl_O0OOOO2_O0Ol.tif
T
0002
T
p=_ _
pl Metadata
H] u0003_0000001_0000002_0002.mix.xml
@) uOOO 3_000000 1_0000002_0002 .oc r.txt
ffi u0003_0000001_0000002_0002.tif
T
m
0003
T
3 Metadata
H] u0O03_O0O0OOl_O0OOOO2_O0O3.mix.xml
13 u0003_0000001_0000002_000_.ocr.txt
1 u0O03_O0O0OOl_O0OOOO2_O0O3.tif
►
3
Documentation
T
3
Metadata
H u0003_0000001_0000002.mods.xml
►
0 0000003
►
0 0000004
►
0 Documentation
►
13 Metadata
Figure ALAB-5. University of Alabama digital file organization for preservation. (©
2009, University of Alabama Libraries. Used with permission.)
Software Limitations on the Number of collections
The expansion of Digital Services had a major impact on Hoole
Special Collections. Archivists were scrambling to find sufficient
content to digitize at first; small collections were easiest to pull
together, so we suddenly were digitizing many tiny collections. What
an archivist considers a collection is determined by provenance, not
by quantity.
However, CONTENTdm (version 4.3) was designed to support no
more than 200 collections. It was clear that we could no longer define
our collections the same way our delivery software did. Therefore we
needed to reorganize our content in a way that provided the user
From Confusion and Chaos to Clarity and Hope 345
access desired by our archivists, while meeting the constraints of our
current delivery system.
We met with the archivists to ask what kind of grouping made
sense to them. After much debate, they finally selected date ranges for
most of our content, which corresponded to particular eventful
periods in American History. Access to each of the digitized analog
collections would be managed by canned links searching for the
analog collection name in the Dublin Core relatiomisPartOf field.
Archivists assigned each collection to a time period, and the metadata
librarian and I began to sort out how to merge multiple collections
and split combined ones.
When merging collections in CONTENTdm, the two collections
being merged must have identical metadata fields. It was during this
process that we realized the depth and variety of our metadata across
all those 30-odd collections. It is far more than can be captured in
Dublin Core, and we realized that we wanted to retain the value-added
labels. Knowing that a person is a photographer or a lyricist or a
composer or a performer is far more valuable than can be conveyed by
Dublin Core "creator."
Our metadata librarian combined all our metadata fields into a
single spreadsheet, so that all the CONTENTdm "containers" could
handle any of our collections. Since each label requires a different
spreadsheet column for ingest, we found we needed 87 columns.
However, by hiding the columns that aren't needed, this becomes
manageable. Different versions of this spreadsheet are used for each
type of material, simply with different hidden columns. Since all the
containers have the same metadata configuration, this streamlines
uploads. The metadata librarian need only unhide the columns and
export the spreadsheet for upload. Recognizing the value of the many
tags in use, we decided to map all the fields to MODS (Library of
Congress, 2009c) to discover if that metadata standard would be
capable of retaining all our descriptive information; and indeed it
does.
My associate dean recognized the possibilities for further
leveraging our current file organization for better user access and
346 From Confusion and Chaos to Clarity and Hope
delivery, sidestepping many of the restrictions placed on us by
CONTENTdm. He procured the services of a talented programmer
(Tonio Loewald), who proceeded to write a translator, which, given a
template, can translate any Unicode tab-delimited spreadsheet into
separate XML records of the desired schema. His Archivists Utility
(Loewald, 2009b) reads in our 87-field spreadsheets and creates a
MODS XML record from each line. This enables us to capture all of
our metadata, not just the Dublin Core elements, outside of the
delivery software for preservation storage.
We added scripts using open-source software such as
ImageMagick (ImageMagick Studio LLC, 2009), LAME (Cheng et al.,
2009), and Tesseract_OCR (Google, 2009) to generate derivatives
from the archival files. These are placed into web-accessible
directories that mirror our archive structure. By adding the MODS,
and now our newly emerging EADs (Library of Congress. 2009a) we
now have all the components for an open, modular delivery system.
Under my associate dean's direction, the programmer has built an
XML schema-agnostic delivery system, Acumen (Loewald, 2009b),
that reads the XML where it lives in a live directory. Metadata and
derivatives can be accessed easily and changed at any time without
going through any software system. Web agents and web search
engine crawlers can easily access our online content also, as it is not
buried in a back-end database. Relationships between files are
inferred by the file naming system, so that all components of an item
are retrieved together, and all items in a collection can also be
retrieved by using the collection identifier. In addition, since the file
name communicates the hierarchy and relationships of files, digital
content can be reorganized according to work flow or even moved
between servers while keeping unchanging URLs.
We're very excited about the possibilities this modular
methodology offers. By bringing digital content up to the level of the
web, we are setting the stage for semantic web applications and the
development of user-friendly tools for access and reuse of our content.
From Confusion and Chaos to Clarity and Hope 347
Staffing Gap for Creation of Item-level Metadata
Shortage in funding support for archivists is widespread. With
the current attention to digitization of archival content, and the lack of
funding for archivists, a critical gap is created. The archivists are well-
versed in the knowledge of the time periods, issues, relationships, and
people related to the archival content we seek to digitize; they are the
personnel best suited to describe the material in a way which will
enable users the best possible access. A knowledgeable archivist will
be able to identify important personages of the time, particular
buildings and locations, and can provide biographical, historical, and
cultural context which gives meaning to the documents we seek to
digitize. Since the metadata provides the necessary information for
successful retrieval, correct descriptive information may be the most
valuable knowledge capture we could provide. However, the archivists
are hard pressed to provide even minimal processing to the
burgeoning mass of incoming content. In order for us to move forward
in digitization without more funding for archivists, we began to devise
pilot projects to seek out alternative possibilities.
Our first pilot project involved students creating item-level
descriptive metadata as they digitized content. The collections chosen
were small image collections containing a little over 200 photographs.
To test for consistency, we assigned only one student to one of the
collections, and four different students worked on the second
collection. Within a few days it was clear that our careful instructions
to the lone student were clearly insufficient. Her metadata
spreadsheet was rife with errors; the primary focal person in the
collection had his name misspelled seven different ways, and many
words were abbreviated (and usually incorrectly). In 56 entries, we
located 217 spelling and abbreviation errors. Grammar, punctuation,
capitalization and spelling errors were abundant, and the captions and
descriptions created were vague and unhelpful.
In the jointly described collection, each student had his own
methodology and focus, and the variations between the choices for
descriptions and captions were sufficient to impair search and
retrieval even within the collection itself. Where one student might
348 From Confusion and Chaos to Clarity and Hope
use the terms "infant" and "woman," a second student would use the
terms "baby" and "lady." While the errors and misspellings were far
less frequent, it was clear that for consistency, we needed either
stricter guidelines and controlled vocabularies, or to limit the number
of employees assigned to create metadata for any one collection.
Additionally, since the students had no background information on
the context of the collection, the time period, or the content, they were
unable to identify well-known buildings, well-known personages, or
other significant content of interest. This point became extremely
clear when our students could not identify such historically important
figures as Governor George Wallace or the locally beloved Paul "Bear"
Bryant.
A second pilot project involves the reuse of existing MARC
metadata for sheet music which had been cataloged over a period of
years by various catalogers. After extraction of the MARC records and
analysis, we were dismayed at the variations in how the MARC fields
were used. For example, we found the first line of text in fields 590,
500, 740, and 246, along with other types of content. The arranger,
lyricist, musician, and composer names were interspersed in field
245c with various textual prefixes, preventing a systematic method of
separation. Without remediation, transformation to another metadata
scheme would not be recommended, as we cannot safely crosswalk
these fields. Metadata librarians and catalogers are currently involved
in repairs to the original records.
A third pilot project involves having an archivist and metadata
librarian each developing their own version of item-level metadata for
the same photograph and correspondence collections. Each worker
gathered time measurements, and neither party was allowed to view
the others' descriptions until finalized. Both versions of each
collection will be put online in the same web interface, and usability
tests will determine to what extent the differences impact user
experience. Questions for users will be derived from the known
metadata in the finding aid, since the information in the series
description would lead the user to expect success in searching for the
item-level material that fit the description. If, for example, the series
description mentions letters about boarding schools prior to 1900, a
From Confusion and Chaos to Clarity and Hope 349
query in the usability study would be to locate a letter about boarding
schools written prior to 1900. This method of creating queries was not
communicated to the metadata creators, so it could not skew the
results. It is possible that neither metadata version will provide
helpful results to the queries. In the final analysis, the level of usability
for each version will be weighed against the time cost and availability
of the personnel in the department that created it. We hope to be able
to relieve the archivists of the burden of item-level metadata creation.
While archivists know more about the content, and provide more
detailed and informed metadata, their available time is very limited.
Over the years, we have assumed that the expertise of the metadata
creators is of primary importance, but new findings by Paul Conway
(Conway, 2008) have thrown doubt on that belief. Since "the proof is
in the pudding," we will determine if it is to our advantage to reduce
the quality of metadata in order to get more of our content online.
A fourth pilot project will depend solely upon EAD finding aid
descriptions at the series and sub series level to provide findability
and context for digitized items. Our Archival Access Coordinator
(Donnelly Lancaster Walton) suggested that we seek to recreate online
the experience of a patron exploring material in the reading room: a
folder is opened, and the patron goes through the documents one at a
time. No other information is available to describe the material. This
methodology will enable archivists to focus on EAD finding aid
descriptions, and our digitization team will take box after box and
simply digitize content in order for web delivery. As content is
digitized, links will be added to the online finding aids from the folder
level, out to web directories which contain the digitized items, ordered
as they were encountered in the folder. In this manner, we will be able
to provide online access to huge collections for which we have
insufficient resources to provide item-level descriptions.
We have just been funded by NHPRC to demonstrate this low-
cost, scalable model (University of Alabama Libraries, 2010a). Already
we have developed scripts (University of Alabama Libraries, 2010b) to
add links into the EADs and create minimal MODS records as
quantities of scans become available. The software we develop to
support this mass digitization method will be made available open
350 From Confusion and Chaos to Clarity and Hope
source. As soon as the content begins to appear online, we will
conduct usability studies to compare the user experience between
accessing content via the finding aid, versus item-level search and
retrieval, and will publicize the results.
The fifth pilot project we expect to undertake in the coming
months will involve users in tagging our materials online. To the
extent possible, we will repurpose available open source software,
capturing the tags and free-text descriptions in a database for vetting
by our metadata librarians. Tags may be made automatically visible in
the web interface and included in indexing for search and retrieval.
After review, acceptable entries will be added to the to the descriptive
metadata record on file. Our first foray into this venue will be with
photograph collections for which we have almost no descriptive
metadata. Hence, any apparently valid contributions will be accepted.
In this manner we also hope to build user interest and support as we
build interaction into our interface.
Summary
In the real world, digitization support within an institution may be
fragmented across several departments, only partially funded by
donor gifts, and may suffer seemingly arbitrary restraints imposed by
delivery software. Some of the departments upon which success
depends may be understaffed and unable to meet the demands for
digitization support. In addition, most digitization is undertaken with
little thought for the future, either in terms of transitioning to
alternative delivery software, or long-term access to digital content.
Obstacles are opportunities for creative solutions. We leveraged
expertise available across the library to build a successful cross-
departmental collaboration. We are developing open source software
support for an open, scalable, modular digital content delivery system
with consideration for long-term preservation. Our file organization
patterns alone may prove to be a life raft for digital content which
funding cuts have left unsupported. This or a similar patterning will
make reconstruction of digital content far easier. By possibly adding a
Baglt manifest (Boyko, Kunze, Littman, Madden & Vargas, 2009), it
From Confusion and Chaos to Clarity and Hope 351
becomes potentially feasible to zip up entire digital archives for long
term storage in a safe repository or LOCKSS until such time as
funding support returns.
The delivery system we are developing will be free and open
source, requiring minimal technical expertise. An underfunded
institution will be able to use it to raise their content to the level of the
web where search engines can promote their materials and web agents
can provide greater usability. Since the actual metadata and delivery
content is not ingested, but left in the web directory, it can be changed
as needed, without risk of harm to the online delivery. Metadata
schemes change regularly, and this XML schema-agnostic, modular
solution is a low-cost, scalable, simple approach to building online
digital libraries with the future in mind.
Thus far, we still use CONTENTdm for participation in multi-site
search systems that depend upon this software, such as our state-wide
Alabama Mosaic (Network of Alabama Academic Libraries, n.d.).
However, the winds of change are blowing. During an open discussion
at the 2009 AlabamaMosaic Annual Meeting in Montgomery,
Alabama, the director reported a conversation with an OCLC
technician who stated that FirstSearch would replace the
CONTENTdm Multi-Site server. Thus, continued support of the
CONTENTdm Multi-Site Server (OCLC, 2009c) is suspect, as OCLC
moves toward FirstSearch (OCLC, 2009b) for cross-database search
support. Many small collaborative digitization efforts depend upon
CONTENTdm, but they need not be held hostage by a proprietary
system. Acumen could potentially fill the gap, offering a low cost
option providing much of the same functionality without the
constraints.
We continue to explore how best to fill or reduce the gap created
by the apparent need for item-level metadata for access and retrieval,
and hope that the solutions we adopt in that area will help others as
well. We have certainly found that working together with mutual
respect and consideration has brought both challenges and
unexpected benefits. By bringing everyone to the table, struggling
together with the chaos before us and facing the same goals, we have
35 2 From Confusion and Chaos to Clarity and Hope
unleashed passion and creativity, enabling us to make astounding
strides into clarity, organization, and hope for the future.
References
Boyko, A., Kunze, J., Littman, J., Madden, L. & Vargas, B. (2009). The
Baglt file packaging format (V0.96). Retrieved November 14,
2009, from http://www.cdlib.org/inside/diglib/bagit/
bagitspec.html
California Digital Library. (2009). Online Archive of California
(OAC). Retrieved November 15, 2009, from http://www.cdlib.org
/inside/projects/oac/
Cheng, M., Taylor, M., Hegemann, R., Leidinger, A., Tominaga, T,
Shibata, N, et al. (2009). The LAME project. Retrieved November
14, 2009, from http://lame.sourceforge.net/index.php
Conway, P. (2009). The image and the expert user: a qualitative
investigation of decision-making. Paper presented at Archiving
2009, Arlington VA. In Archiving 2009, Vol. 6. (pp. 142-150).
Society for Imaging Sciences and Technology.
DCMI. (2009). Dublin Core metadata initiative. Retrieved November
15, 2009, from http://dublincore.org/
Fogel, P. (2006). CDL 7train Profile - CONTENTdm simple and
complex objects in METS Metadata encoding and transmission
standard. Retrieved November 15, 2009, from
http : / / www.loc.gov/ standards / mets / profiles /00000010 .html
Fogel, P. & Hetzner, E. (n.d.). 7train METS Generation Tool.
Copyright the University of California Regents. Retrieved
November 15, 2009, from http://seventrain.sourceforge.net/
Google. (2009). Tesseract-ocr. Retrieved November 14, 2009, from
http:/ / code.google.eom/p/tesseract-ocr/
ImageMagick Studio LLC. (2009). ImageMagick. Retrieved
November 14, 2009 from http://www.imagemagick.org/
script/index.php
From Confusion and Chaos to Clarity and Hope 353
Library of Congress. (2009a). EAD Encoded archival description
(Version 2002). Retrieved November 14, 2009, from
http://www.loc.gov/ead
Library of Congress. (2009b). METS Metadata encoding &
transmission standard. Retrieved November 14, 2009, from
http : / / www.loc.gov/ standards / mets /
Library of Congress. (2009c). MODS metadata object description
schema. Retrieved November 14, 2009, from http://www.loc.gov/
standards/mods/
LOCKSS. (2008). What is the LOCKSS program? Retrieved
November 14, 2009, from http://www.lockss.org/lockss/Home
Loewald, T. (2009a). Acumen. Retrieved November 15, 2009, from
http://acumen.lib.ua.edu/
Loewald, T. (2009b). Archivists utility. Retrieved November 15,
2009, from http://lb-4i6-oo3.1ib.ua-net.ua.edu/notes/?f=
Archivist%2oUtility.txt
Network of Alabama Academic Libraries, (n.d.). Alabama Mosaic.
Retrieved November 14, 2009, from
http://www.alabamamosaic.org/
Network of Alabama Academic Libraries. (2009). The Alabama
Digital Preservation Network (ADPNet). Retrieved November 14,
2009, from http://www.adpn.org/
OCLC. (2009a). CONTENTdm Digital collection management
software. Retrieved November 15, 2009, from
http://www.contentdm.com
OCLC. (2009b). FirstSearch Online reference. Retrieved November
15, 2009, from http://www.oclc.org/firstsearch/
OCLC. (2009c). Multi-site server. Retrieved November 15, 2009, from
http://www.oclc.org/firstsearch/
University of Alabama Libraries. (2010a). Septimus D. Cabaniss
Papers digitization project. Retrieved March 9, 2010 from
http://www.lib.ua.edu/libraries/hoole/cabaniss
354 From Confusion and Chaos to Clarity and Hope
University of Alabama Libraries. (2010b). UA libraries digital
services planning and documentation. Retrieved March 9, 2010
from http://www.lib.ua.edu/wiki/digcoll
W3C. (2005). XML:id Version 1.0. Retrieved November 15, 2009,
from http://www.w3.0rg/TR/xml-id/
Digitizing Colorado State University's
Historic Photograph Collection: A Case
Study
Beth Oehlerts (Colorado State University Libraries)
Abstract
This chapter discusses the experiences of creating a medium-sized
digital collection of the earliest photographs from Colorado State
University's Historic Photograph Collection. The entire collection of
500,000 photographs chronicles the history of Colorado State
University, the city of Fort Collins, and Rocky Mountain National
Park. Digitizing the first phase of this vast collection included
capturing 5500 images from glass plate negatives, 750 images from
magic lantern slides, and 7500 images from gelatin nitrate prints.
These formats were chosen because they are the oldest and most
fragile. The project was not without challenges, including utilizing
untrained staff and students, coping with equipment problems, and
creating avenues of communication to more than 40 people involved
in the creation of the collection. This chapter will discuss these
challenges and how we worked to resolve them.
Keywords: Archives, Digital collaboration, Digital collections, Digital
projects, Digital scanners, Gelatin nitrate prints, Glass plate negatives,
Magic lantern slides, Photograph collections.
355
356 Digitizing Colorado State University's Historic Photograph Collection
Introduction
Beginning in the late 1990s, Colorado State University Libraries
(CSUL) started large-scale digitization activities in partnership with
the Colorado Digitization Project (CDP). The first of these was the
Warren and Genevieve Garst Photographic Collection of wild animal
photographs donated to CSUL and digitized with CDP funding
(http://lib.colostate.edu/wildlife/). The Sidney Heitman Germans from
Russia Collection was another early digitization project
(http://lib.colostate.edu/gfr/index.html) funded by the CDP. These
projects, and a series of smaller ad hoc projects over the next three
years, were created following the CDP's best practices guidelines and
gave a few staff the opportunity to learn digitization.
In 2006 CSUL received the University Historic Photograph
Collection (UHPC) a collection of 500,000 photographs documenting
the history of Colorado State University, Fort Collins, and Rocky
Mountain National Park. The collection came from the University's
Office of Instructional Services and was given to CSUL's Archives and
Special Collections Department with the understanding that it would
be preserved and access given to a wider audience through
digitization. Planning for the digitization of the first phase began in
early 2007, as did creation of a finding aid for the earliest images in
the collection, which included glass plate negatives, magic lantern
slides, and gelatin nitrate prints from cellulose nitrate negatives.
These photographs are also the most fragile materials in the
collection. The goal was to digitize approximately 5,500 glass plate
negatives, 750 lantern slides, and 7500 gelatin prints.
Although we had no budget line dedicated solely for digitization,
our work did have the support of both the Colorado State University
(CSU) and CSUL's administrative teams. Funds were provided by the
Libraries' administrative team to purchase digital scanners, including
an archival-quality overhead scanner used for later archival
digitization projects. Administrative support provided extra funding
in the project when we needed to hire students and acquire additional
server space to house our digital master files.
Digitizing Colorado State University's Historic Photograph Collection 357
Literature Review
Of the recent literature addressing digitization, the focus is often on
the technical and preservation-related issues that arise following the
creation of a digital collection; less of the recent literature focuses on
the staffing issues encountered when creating digital collections.
Boock and Vondracek (2006) found that when institutions begin
digitization activities they often add these tasks to staff
responsibilities rather than hire new staff dedicated to digitization. In
a 2005 survey of ARL libraries, they found a majority of the
responding institutions capitalized on the existing knowledge and
skills of staff, encouraging current positions to evolve rather than
hiring new digitization staff, as was the case at CSUL. Boock continues
the discussion, focusing on how the Oregon State University Libraries
(OSUL) reassigned several staff positions to a new digitization
production unit, initially using staff for scanning, quality control
reviews, and metadata creation using a metadata schema (Boock,
2008). OSUL relies on student employees for almost all of its digital
imaging and metadata assignment, performing no quality control
reviews on their work. In examining the organizational implications of
digitizing, Sutton (2004) recalls how early digitization efforts were
seen as temporary endeavors, requiring the temporary re-assignment
of staff.
D'Andrea and Martin (2001), reporting on digitization workflows
at Temple University, discuss utilizing part-time student staff to
supplement the work of digitization staff. DAndrea and Martin note,
and our experiences confirm, there are positives and negatives in
hiring students to work on digitization projects. The authors
recommend hiring students whose interests match the project, not
just those who apply for the job, and fully informing students about
the nature of the work.
Establishing Best Practices
In 2005, in response to the growing need for local standards and
procedures to guide the creation of CSUL's digital collections, CSUL
created the Digitization of Local Collections Task Force and charged it
358 Digitizing Colorado State University's Historic Photograph Collection
to review existing national and international standards and best
practices, and to recommend the steps necessary in building high-
quality local digital collections. Among its findings, as reported in the
final report, Digitization of Local Collections (Digitization of Local
Collections Task Force, 2005), the Task Force identified a lack of
formal project management, a need to prioritize projects, and the
need to determine the impact in-house digitization would have on
staffing. Based on the Task Force's final report and the Western States
Digital Imaging Best Practices, version 1.0 (Western States Digital
Standards Group, 2003), CSUL then created a series of local best
practices covering all aspects of the digitization process, starting with
guidelines on how to handle and care for materials to be scanned. Our
policy is to capture an image that is a true representation of the
original. To do this all the objects in the UHPC digitization project
were digitized as archival objects, with no alteration made to the
master image.
With some of the glass plate negatives, lantern slides, and gelatin
nitrate prints more than 100 years old, including glass plates that are
cracked or chipped, we scan an object once to reduce the chance of
damaging fragile materials by repeated handling. Master images are
created as TIFF (Tagged Image File Format) files. The glass plates
were scanned at 1200 dpi, the magic lantern slides at 2400 dpi, and
the gelatin nitrate prints at 1200 dpi. The master images were
deposited in a project folder on a central server prior to the
application of metadata and ingest into our digital repository.
Our best practices documentation, including national and
international digitization standards documentation resides on the
CSUL staff wiki where it can be accessed by all employees. The wiki
includes a wide range of information related to creating CSUL digital
collections, including how to operate the digital scanners by format
and instructions for creating project-tracking Excel spreadsheets that
record the work of each staff member. The information gathered on
these spreadsheets enables the supervisor to monitor and evaluate the
work of the scanning staff, identify inefficiencies, and keep the project
on schedule. Workflow analysis helped support our proposals to hire
additional student scanning staff, purchase additional scanning
Digitizing Colorado State University's Historic Photograph Collection 359
equipment, and identify changes in workflow that will improve future
digitization projects.
Project Planning
UHPC digitization project planning began in early 2007, with
discussions and project meetings between the project archivist, the
digitization librarian, and the metadata librarian. The project's
participants knew what would be digitized, what metadata was
needed, and where the collection would reside, but we had to develop
a road map to help us achieve these goals.
That road map began to take shape with the UHPC finding aid,
created by the UHPC project archivist. The digital projects librarian
created the UHPC digitization project-tracking documentation and
identified staff for scanning, created staff scanning schedules, and
arranged for staff access to the Archives Annex where the materials
were housed. The metadata librarian, working with the project
archivist, developed a plan to use the information in the finding aid
for the metadata. Then the digital services librarian developed the
procedures for loading them into our content management system.
In September 2007, the UHPC digitization project scanning
started with a small group of test scans to determine the best
procedures for creating the digital images. Using CSUL's standards-
based digitization guidelines, staff created approximately one dozen
images. These scans were reviewed for accuracy and quality by both
the project archivist and the scanning supervisor; when the pilot
images met the requirements for the project, scanning began.
Utilizing Untrained Staff
Initially the number of trained digital scanning staff at CSUL was
relatively small, with a Library Technician I and a Library Technician
II, who worked on earlier digitization projects and had experience
with scanning photographs and text documents, and a Library
Technician II, with limited scanning experience. A Library Technician
III supervised the work. These four staff also had other job
assignments: the Library Technician I assisted in preparing materials
360 Digitizing Colorado State University's Historic Photograph Collection
for the bindery and assisted with a large inventory project; and the
Library Technician lis performed database maintenance tasks and
also assisted with the inventory project. The Library Technician III,
who reports directly to the digital projects librarian, supervises
CSUL's in-house digital scanning, the database maintenance unit and
the inventory project. She is responsible for creating and adjusting
scanning staff schedules, monitoring scanning progress, contacting
Library Technology Services (LTS) staff to maintain and repair
scanning equipment, and performing quality control reviews of the
scanned images.
The project began with 56 hours of scanning per week. Unable to
hire additional staff, we examined staff duties to identify tasks that
could be streamlined or eliminated altogether. A decrease in the
number of print journal subscriptions and a reduction in the number
of titles sent to a commercial bindery meant those saved hours could
be applied to digitization. The same was true with tasks associated
with database maintenance. For example, when approached by the
CSUL Government Documents Unit to transfer the government
documents database maintenance tasks to the Government
Documents unit, we agreed and made the change. There were also
some basic database maintenance tasks that could be performed by
students, saving more time for digital activities.
We identified one of our Library Technician lis whose knowledge
of and experience with scanning, combined with her skills as an
instructor, made her an excellent teacher. With these skills she trained
our staff to create the high quality digital images we needed and now
trains all our scanning staff and students.
Experience has shown us that the level of training needed for
creating digital images varies by individual and must be tailored to the
individual. Each person brings various levels of experience and
understanding to digital scanning. Aside from the level of experience,
the unique nature of the materials, coupled with unfamiliar models of
equipment, requires individualized training. Training begins with
scanner instruction and viewing Cornell University Library's Moving
Theory Into Practice Digital Imaging Tutorial (Cornell University
Digitizing Colorado State University's Historic Photograph Collection 361
Library, 2003.) Our digital imaging trainer spends 2-3 weeks teaching
the trainee the local, national, and international standards and best
practices, how to operate all of the digital scanning equipment, how to
handle the materials being digitized, and any necessary project-
specific scanning requirements. It usually takes staff, with multiple
job responsibilities, one year to acquire the skills to create high quality
archival digital images.
In December 2007, a couple of months into the project, the
scanning was not progressing as quickly as expected. We hesitated to
consider using student employees due to the fragile condition of the
glass plate negatives and magic lantern slides, but there was no other
option; the project needed to keep on schedule. We trained our most
experienced and mature student employee first and eventually hired
more students to assist with scanning. The first three hourly student
scanners spent a combined average of 36 hours per week scanning. As
summer approached we asked and received permission to offer
summer employment to two of the students. The ability to keep them
working during the summer, at a total of approximately 60 hours per
week, helped us keep the project on schedule.
Midway through the project, in the spring of 2008, four members
of the copy cataloging staff joined the UHPC digitization project. They
were trained to assist the metadata librarian with metadata creation, a
natural progression of their duties. The copy catalogers spent
approximately 20 hours per week throughout the project assisting
with metadata.
The UHPC project archivist, our authoritative resource for the
project, was also our daily contact for the project. Creating images at
the Annex proved to be a benefit for the scanning team because the
project archivist's office is located at the Annex. She was available to
answer questions about the extremely fragile materials and instruct
staff on how they should be handled. At the start of digitization, the
project archivist was responsible for performing quality control
reviews of the images, but we soon discovered this arrangement was
not feasible. The project archivist was still in the midst of processing
the collection's 500,000 images, and the logical solution was to have
362 Digitizing Colorado State University's Historic Photograph Collection
the scanning supervisor review the scans. To reduce the number of
possible errors, scanning staff perform a review of their own scans
prior to the end of their work day. Staff are expected to correct any
errors found before a review by the supervisor. If the supervisor
discovers images that need to be rescanned those are given back to the
person who created the image. Correcting scanning errors became a
learning opportunity and the person responsible for the scanning
error was responsible for rescanning the image. If the problems were a
result of malfunctioning scanning equipment, quality control reviews
were performed on 100% of the images until the problems were
corrected.
Additional changes in job responsibilities occurred in the spring
of 2008 when we replaced most staff working on the large inventory
project with student employees. The change gave us approximately 32
additional hours per week to devote to the UHPC digitization project.
To accommodate the increase in hours we met with the archivists and
the Archives and Special Collections Coordinator to request an
extension of the hours at the Annex. The Archives staff was
understandably reluctant to have others in the Annex when an
archivist is not present, but they agreed to adjust the schedules of staff
in the Annex and to extend the hours. The increase in the rate of
production made meeting the project deadline a more realistic goal.
By July 2008 all of the glass plate negatives and magic lantern
slides had been scanned and once the quality control reviews and all
rescans were complete we notified the metadata librarian. She began
the process of preparing the metadata for the copy catalogers. Upon
completion, the metadata librarian reviewed their work and requested
corrections where needed. The metadata librarian then notified the
digital services librarian, who began the process of ingesting the
images and metadata into the digital repository.
During the life of the UHPC digitization project we lost and
gained staff and student employees. The impact on the project was
challenging at times. In the late spring and again in the fall of 2008,
two Library Technician I staff undertook several weeks of intensive
training to acquire the skills to fill void of created by the loss of two
Digitizing Colorado State University's Historic Photograph Collection 363
employees. The new staff were unfamiliar with digital scanners, they
had little or no experience handling fragile photographic materials.
Fortunately they quickly became proficient and by the end of 2008 we
added 24 hours a week back to our scanning schedule.
The final format to be digitized was the gelatin nitrate prints.
Work on digitizing this format began in August 2008 when we hired
eight work study students to perform the gelatin nitrate print
scanning. We began the fall semester with 132 hours per week: 80
hours of student scanning time, 32 hours of staff scanning, training,
and quality control review, and 20 hours of staff metadata creation.
We hoped the students would be able to work largely on their own
after training. The scanning supervisor worked closely with them for
the first couple of weeks; she then let them work on their own. A short
while later, while performing quality control reviews of their work she
noticed the students were making a large number of errors, many of
them basic scanning errors. Our experiment to reduce the close
supervision of the students was not a success. Distractions, or perhaps
the lure of the holidays, resulted in a lack of concentration.
Scanning the UHPC collection took place at a site across campus.
Saving the images to the Libraries' server from this remote site, and
searching the web at the same time, caused the computers and
scanners to slow and occasionally created serious technical problems.
We now block internet access at all our scanning stations and more
closely supervise the students' work.
There are advantages and disadvantages with using students to
work on digitization projects. The advantages include lower labor
costs, a more flexible workforce, and the opportunity to give students
valuable skills and good work habits. One disadvantage is the
substantial investment in teaching students the skills and knowledge
needed to create high-quality digital images only to have them
graduate after a year or two of employment. Due to the repetitive
nature of the work, it is difficult for some students to concentrate on
scanning images. Many times students are not able or willing to work
during school breaks. In spite of the disadvantages, we found that the
positives of employing students outweigh the negatives.
364 Digitizing Colorado State University's Historic Photograph Collection
It is difficult to firmly estimate the number of hours of scanning
time you will need when planning a digitization project. To
compensate for staff leave and the uncertainty of student employment
during breaks in the academic year or during final examinations, it is
a good idea to budget extra time for digitization. Project downtime
may occur because of equipment problems. In our case, while the
scanners were being repaired we had to bring the scanning to a halt
and reassign staff and students to other projects at CSU's Morgan
Library. In addition to the scheduling problems on the UHPC
digitization project, staff scanning schedules had to include 15
minutes to walk across campus to the Archives Annex, where the
scanning was done. By using our project-tracking documentation we
were able to assess the progress of the project at various points and
make any adjustments needed to keep us on schedule. The willingness
of everyone involved to adapt and remain flexible was a major factor
in the success of the project.
Overcoming Equipment Problems
The project began with just two Microtek Artixscan i8oof flatbed
scanners. The Microtek scanners were purchased prior to the project
startup and were chosen for their design, which includes a drawer
where the glass plate rests emulsion-side up without touching glass or
the light source. This separation is desirable when scanning fragile
glass plate negatives. The scanners were installed and calibrated by
the LTS staff, the equipment experts. During the first four months of
scanning we encountered numerous equipment problems, including
having to recalibrate the scanners every day, sometimes several times
a day to eliminate colored lines on the digitized images.
Equipment problems plagued us through much of this project and
adding staff to scanning team did not help the situation. With more
people scanning the scanners were in operation for 50 or more hours
per week. This high rate of production took a toll on all the scanning
equipment and the scanners were showing the stress of overuse. We
needed more scanning equipment for both technical reasons and to
increase production. The digital projects librarian recommended staff
Digitizing Colorado State University's Historic Photograph Collection 365
operate two scanners at a time, as one way to increase production.
With 20 minutes to scan one glass plate, staff had time to prepare one
image for scanning while the other was being scanned. We decided to
follow this recommendation and requested that CSUL purchase two
Epson Expression 10000 XL scanners, a model that best suits our
needs. The scanners arrived in January 2008 but then we had to wait.
It took LTS a couple of weeks to install the scanners. At the same time
one of our Microtek scanners malfunctioned. Attempts to repair the
scanner were unsuccessful and the only option was to return it to the
manufacturer for refurbishment. The equipment installation delays
and failures caused us to suspend the UHPC scanning for two weeks.
We had to reassign staff and students to other projects in the interim.
By February 2008 the four scanners were in place and working,
but then we were experiencing other problems. Our new Epson
scanners were producing Newton's Rings, a series of concentric,
alternating light and dark rings centered at the point of contact
between the glass surface of the scanner and the glass plate negative,
on the scanned images (Illueca, Vazquez, Hernandez, & Viqueira,
2002). The older scanners were beginning to produce banding
artifacts (straight lines) on the images. As this only happened with the
glass plate negatives, we switched from scanning glass plate negatives
to scanning the magic lantern slides until we could determine a
solution to the problems. A resolution to the Newton's rings problem
came from the digital projects librarian, who thought an acid-free
paperboard (barrier board) frame, similar to mat boards used in
picture framing, might resolve the problem. The frame raises the glass
plate from the scanner glass just enough to allow air to circulate
between the plates of glass and eliminates the Newton's rings. During
the digitization of the University of North Carolina at Chapel Hill's
William Blake Archive, scanners encountered Newton's Rings when
creating images from transparencies. Their solution was to scan the
images directly and not through glass (Viscomi, 2002). This solution
was not practical for us. Using a frame to separate the glass plate
negatives, which are extremely fragile, from the scanner glass
provided a safe method to protect the emulsion side of the plates. The
366 Digitizing Colorado State University's Historic Photograph Collection
banding artifacts problem was resolved when we began a daily
cleaning and re-calibrating of the scanners.
There was also the challenge of creating quality images from over-
exposed and under-exposed glass plate negatives. We adjusted the
scanner to capture either a darker or lighter image to correct the poor
exposure. One point of pride for us was that we did not break or
damage any of the fragile glass plate negatives in the collection.
Improving Interdepartmental Communication
Prior to the UHPC digitization project most digitization work was
small ad hoc projects that did not require planning meetings, project-
tracking documentation, detailed condition assessment guidelines, or
extensive quality control reviews. As a result, not everyone
participating in creating digital projects was aware of a project's
status. Our digitization procedures changed with digitizing the UHPC
collection. During the digitization project not all of us worked in the
same location. The project archivist worked in one building, the
metadata librarian, the digital projects librarian, and the digital
services librarian worked in another. We needed to develop a project
management structure that could foster communication and promote
collaboration. Project communication, which was taking place via
email, often resulted in further 'follow-up' emails. We realized that
everyone involved in the project needed to know what was happening
and why.
In early 2008 the Repositories Matrix Team, CSUL's
administrative and policy-making team for digital activities, created
the Digital Projects Management Plan Working Group (Working
Group). The Working Group's charge is to monitor the progress of all
digitization projects. It includes representatives from Archives and
Special Collections, Metadata and Preservation Services, and Digital
Repositories Services; every department participating directly in
digitization. The Working Group began meeting weekly in February
2008. The initial focus was on the issues and problems surrounding
the UHPC project. Now members report on digitization project news,
digital equipment problems and equipment purchases, staffing issues,
Digitizing Colorado State University's Historic Photograph Collection 367
and conclude with a 'round robin' of project status reports. All policy
questions are forwarded to the CSUL Repository Matrix Team.
Building on the Working Group model, we now schedule pre-project
planning meetings for all new digitization projects. These meetings
focus on the scope of the project, staffing needs, equipment needs, and
the creation of a digitization timeline. Attendees include faculty and
staff who will be involved in the creation of the digital collection.
The collaboration that formed during the UHPC project
contributed much toward improving interdepartmental
communication. The project required a close working relationship
along with frequent meetings and emails between the project
archivist, the digital projects librarian, the metadata librarian, and the
digital imaging staff. Group members were willing to share their
knowledge and skill with others and remain flexible while workflows
shifted and changed course during the life of the project. Those
involved with building digital collections at CSUL have learned that
building quality digital collections requires planning, good
communication, and a commitment to a collaborative endeavor.
Summary/Lessons Learned
Digitization of the UHPC collection provided several lessons. Within
months of commencing the project, with enough data to create an
accurate picture of our progress, we realized that we needed more
staff, more equipment, and more time to complete the project. While
hiring additional digitization staff was not possible, our option was to
train staff in other units to assist with portions of the work. As
mentioned above, the willingness of everyone involved to remain
flexible was a major factor in the success of the project.
When budgeting time for a digitization project, experience taught
us that 20% more time should be added to the timeline, to allow for
problems of all types to be resolved. Developing a realistic project
budget and project timeframe will also help administrators and others
understand the true costs of building a valuable digital collection. A
short pilot project, conducted prior to the start of the project, can
reveal quite a lot. It will show you how much time it really takes to
368 Digitizing Colorado State University's Historic Photograph Collection
scan the materials; which scanner settings should be used for a
particular format; and whether it is possible to make textual
documents searchable. Although we did conduct a short pilot, it was
designed to learn how to operate the scanning equipment. We should
have gathered other information, such as how long it takes to scan a
glass plate negative, because soon after the scanning began we
realized that we had wildly underestimated the time we thought was
needed to scan these negatives. The 5 minutes scheduled per scan in
fact turned into 20 minutes. A more comprehensive pilot would have
revealed, among other things, what was technically feasible.
The UHPC digitization project underscored for us the importance
of regularly sharing information with the project's participants. Good
communication, as mentioned above, was vital to understanding the
UHPC project's goals and unique digitization requirements. With
more than 40 people involved in its creation, sharing information
helped us understand how to handle the fragile archival materials,
avoid many image rescans, understand the reasons for many of our
equipment problems, create solutions, and keep on track to meet the
project's deadlines.
Conclusion
By June 2009 the photographs in the first phase of the project were
digitized, and the images and metadata began to be loaded into the
digital repository. Each digitization project is unique, and each has the
potential to present new challenges. Though it might not be possible
to anticipate every outcome, there are some things that can be done to
prepare the project for success. Obtain the support of your
administration or funding agency for the project; develop good project
planning skills and implement them; create tools for communicating
with all project participants and document all the processes and
workflows of the project. Subsequent digitization projects from this
vast collection will benefit from what we learned during the first phase
of the UHPC digitization.
Digitizing Colorado State University's Historic Photograph Collection 369
References
Boock, M. (2008). Organizing for digitization at Oregon State
University: A case study and comparison with ARL libraries.
Journal of Academic Librarianship, 34, 445-451.
Boock, M., & Vondracek, R. (2006). Organizing for digitization: A
survey. Libraries and the Academy, 6, 197-217.
Cornell University Library. (2003) Moving theory into practice
digital imaging tutorial. Retrieved on March 31, 2010 from
http : / / www. library. Cornell . edu/ preservation/ tutorial
DAndrea, P. & Martin, K. (2001). Careful considerations: Planning
and managing digitization projects. Collection Management, 26(3),
15-28.
Digitization of Local Collections Task Force. (2005). Digitization of
local collections. Retrieved on March 31, 2010 from
http://digitool.library.colostate.edu/
Illueca, C, Vasquez, C, Hernandez, C, & Viqueira, V. (2002). The
use of Newton's rings for characterizing ophthalmic lenses.
Ophthalmic and Physiological Optics, 18, 361-362.
Sutton, S. (2004). Navigating the point of no return: Organizational
implications of digitization in special collections. Libraries and the
Academy, 4, 233-243.
Viscomi. J. (2002). Digital facsimile: Reading the William Blake
Archive. Computers and the Humanities, 36, 27-48.
Western States Digital Standards Group. (2003). Western States
digital imaging best practices. Version 1.0. Retrieved on March 31,
2010 from http://www.bcr.org/dps/cdp/best/wsdibp_vl.pdf
Entering the Digitization Universe: One
Catalog Librarian's Experience at an
Academic Library
Mary Rose (Southern Illinois University Edwardsville)
Abstract
This chapter describes a catalog librarian's experience with an
academic library's digital collection initiative. The author discusses
how the library handled technical challenges and established policies
and procedures during the process of creating its first digital
collection. The effects of external pressures from consortial
requirements and organizational change are also discussed. The
author describes technical decisions specific to the first project and
more general technical issues like customization decisions and
decisions about filenaming convention. The processes involved in
establishing selection criteria and rights and permissions policies are
described. The author also provides a brief overview of three
subsequent digital projects. The author concludes by speculating on
how the library's digital presence will grow in the future.
Keywords: Academic libraries, Catalog librarians, CONTENTdm,
Digitization, Digital collections.
Lovejoy Library at Southern Illinois University Edwardsville (SIUE)
entered the universe of searchable digitized collections in 2008. We
encountered several issues along the way to completing our seminal
project. There were technical challenges to be met, and we had to
370
Entering the Digitization Universe 371
establish procedures and policies. We also encountered external
pressures due to our reliance upon consortial services and as a result
of organizational changes at the University. This chapter is a narrative
of this experience and a speculation about the future.
Background: the preliminary steps toward establishing a
digital initiative and vision
In 2006-2007, Lovejoy Library administration took the first steps
toward establishing a digital projects initiative by forming a
CONTENTdm committee and acquiring access to CONTENTdm
software as a member of the Consortium of Academic and Research
Libraries in Illinois (CARLI). The software is installed and maintained
on CARLI's server. Two Lovejoy staff members received training in
the use of CONTENTdm; however, neither staff member was
empowered with a mandate to create a digital collection. The initiative
essentially stalled. When I joined the University as the Library's first
catalog and metadata librarian in May of 2007, I recognized that
getting Lovejoy fully engaged in the creation of digital collections was
a main priority of the position. The aforementioned staff members
immediately and gratefully handed their CONTENTdm workbooks
over to me and notified the consortium that I was now the primary
contact for coordinating the Library's use of this software. I had never
previously used CONTENTdm but became intimately familiar with it
over the course of the next several months. Lacking training or
experience, I relied heavily on support services at CARLI to effectively
leverage the software. I also took a generic metadata creation
workshop and studied Dublin Core.
I quickly became aware that two digitization projects were being
spearheaded by two tenured faculty librarians as candidates for our
initial digital collection: one somewhat aggressively as a grant project
and the other more casually without the impetus of a grant. Being
naive with regard to the politics of the organization, I deferred to
others who decided to give precedence to the grant-funded project.
The CONTENTdm committee subsequently decided that the Library
needed a process for evaluating and prioritizing potential digital
collections. Perhaps this was a response to the way in which resources
372 Entering the Digitization Universe
had been committed to the first project because of a schedule driven
by external funding. Or perhaps it was the usual librarian caution that
any new undertaking will grow to unmanageable proportions if fed too
liberally. Perhaps the desire for oversight was motivated by
recognition that the shape of our accumulated digital collections over
time would define the character of the Library to a significant degree,
and whether this was ad hoc or directed was not a matter of chance
but of choice. Whatever the reason, a digitization selection
subcommittee to the collection management committee was proposed
by a tenured library faculty member at the first CONTENTdm
committee meeting I attended.
The digitization selection subcommittee became entwined with
the Library's vision regarding digital initiatives. The subcommittee's
charge was officially established as being the body responsible for
receiving and evaluating digitization project proposals and making
recommendations to the parent collection management committee
regarding acceptance and prioritization of said proposals. The
advisory group comprising the subcommittee included all of the
library faculty administrators plus the Director of Development
(essentially the marketing administrator) and the Director of
Academic Computing. The subcommittee was rounded out by the
Catalog and Metadata Librarian (me), the Electronic Resources
Librarian, the Archivist and Special Collections Librarian (serving as
chair), and whichever subject librarian was participating in a specific
digitization proposal. The group resolved to create a proposal form to
guide proponents in describing the subject, extent, rationale, funding,
etc. of their project ideas. Selection would be accomplished by
carefully evaluating the relevance of a project to the Library's mission
and the advantages a digital platform was expected to provide for the
particular included items, such as wider accessibility for heavily used
resources, easier use of delicate or cumbersome materials, and
improved access to text-rich content through electronic searchability.
Selection criteria suggested by the Northeast Document Conservation
Center were incorporated into the subcommittee's official position.
The Center frames selection around three basic questions (Gertz,
2007):
Entering the Digitization Universe 373
• Should [the materials] be digitized? Is the collection important
enough, is there enough audience demand, and can sufficient
value be added through digitization to make it worth the cost
and effort?
• May they be digitized? Does the institution have the
intellectual property rights to permit legal creation and
dissemination of a digital version?
• Can they be digitized? Will digitization achieve the goals of the
project, given the physical nature of the materials and their
organization, arrangement, and description? Does the
institution have the technical infrastructure and expertise to
create digital files and make them available to users now and
in the future?
Challenges encountered during the first digital project
Our pilot digital collection was the KMOX sheet music
digitization project. Lovejoy Library's Music Special Collections
includes a gift from KMOX of over 48,000 music titles compiled by
the St. Louis-area radio station: the live studio orchestra's complete
performing music library. The titles date from the early 1900s. A
subset of this collection, identified as being published prior to 1923
and hence in the public domain, became the target digital collection.
Academic Computing, an entity under the administration of the
Library's dean, scanned the sheet music in color at 600dpi, enlarged
400% during scanning and saved as uncompressed tif files. I began
working on the project in earnest in January of 2008, with my first
real technical task being to understand CONTENTdm enough to
design a structure to showcase the collection effectively. Eventually I
settled on a strategy: Each piece of sheet music would be what is
known in CONTENTdm terminology as a compound object. Metadata
would be supplied at the object level, meaning each piece of sheet
music would have its own metadata but the individual pages
comprising a given title would not be described separately.
The Fine Arts Librarian had obtained grant monies to hire
graduate student assistants to help with the project. I trained the
graduate assistants how to provide what catalogers consider
374 Entering the Digitization Universe
descriptive metadata. This is the metadata that is transcribed from the
piece being described. In this case, descriptive metadata included the
song title, first line of the refrain, and publication information. I also
showed the students how to search the Library of Congress's free
online authority file (Library of Congress, 2009) for authorized forms
of names for the lyricists, composers, arrangers, performers, and/or
illustrators credited on the pieces. Finally, I created standard notes for
the student assistants to apply, such as "piano, vocal" for an
instrumentation note, "One color (purple)" describing the cover art,
and "Includes advertisements" as a miscellaneous note. I reviewed
their work and completed the metadata with subject analysis, detailed
cover art description, and additional notes.
Learning how to use CONTENTdm required a tremendous
amount of time and energy during this first project. The effort was
amply rewarded, however, since the functionality provided by the
software suited our application perfectly. The software supports batch
population of a collection via tab-delimited files. This facilitated
collaborative metadata creation, since the graduate student workers
could create Excel spreadsheets with preliminary metadata for groups
of titles and then pass them on to me to complete. I subsequently
converted the spreadsheets into tab-delimited files and uploaded the
metadata into CONTENTdm along with the corresponding images.
The compound object structure, in which several images comprise one
digital entity, elegantly matches the character of multi-page sheet
music. The software also provides the means for creating index boxes,
which enhance access to the content beyond full text searches of the
metadata. We decided to use this functionality to create index boxes
for composers, lyricists, and subjects for this project.
As stated previously, Lovejoy Library's digital collections are
created under the consortial umbrella, using CARLI's CONTENTdm
server. CARLI's collection of member libraries' digital collections is
OAI-harvestable, and CARLI provides a means for member libraries
to obtain usage statistics. But with these advantages come some
constraints. CARLI requires all of their hosted collections to contain
certain metadata fields, including (among others) Rights and
Language fields mapped to the corresponding Dublin Core elements
Entering the Digitization Universe 375
and a Collection field mapped to Relation. The Rights field
requirement motivated our CONTENTdm committee to address the
thorny issues of intellectual property more promptly than we might
perhaps have otherwise; as it was we needed to formulate a policy
before publishing the KMOX collection. This proved to be the
committee's most important task. The consortium specified that the
Rights field should identify the intellectual property rights status of
the digital resources in the collection and provide direction for users
to contact the owner. This field could also be used to inform users of
fair use laws. The committee consulted with the University's legal
counsel to develop a rights and permissions policy in conformance
with these guidelines. The digital rights and permissions statement
that eventually evolved through the committee's deliberations
authorizes "fair use" of the digital resources, provides references
describing the legal limits of fair use, specifies the form of attribution,
and provides the means for applying for additional permissions
(Lovejoy Library, 2009).
SIUE is responsible for the resources comprising our digital
collections. Before we begin a digital collection project, we need to
establish our right to create these component digital resources. This
can be accomplished by using source materials in the public domain,
securing permission for digitization and publication from the owner of
the source materials, or actually purchasing the right to digitize and
publish source materials. However, the rights status of the source
materials is not always unambiguous. For the KMOX sheet music
project, items within the public domain were identified as such by
having a copyright or publication date prior to 1923 printed on the
item. However, as I completed the metadata I noticed that some of the
covers exhibited images clearly indicating that they were created after
that date. For instance, the cover of "Come to the Fair" featured a
photograph of the Trylon, Perisphere, and Helicline at the 1939 New
York World's Fair, despite the fact that the music bore a copyright
date of 1917. The cover for "The World Is Waiting for the Sunrise"
depicted singer Mary Ford, who was born in 1924. Other pieces of
music included advertisements for songs displaying later copyright
dates. The Fine Arts Librarian consulted with legal counsel about the
376 Entering the Digitization Universe
status of these items. It was decided that we could include these pieces
of sheet music in the collection if we didn't provide access to the
individual pages that were not in the public domain, an approach that
wouldn't affect the usability of the music itself.
Another consequence of using CARLI's server is that our digital
collections are subject to CARLI's "look and feel" requirements for
uniformity. The consortium allows very little flexibility, as it wants to
maintain a consistent look between the collections of member
organizations. For our first project, this was actually a blessing.
Designing a branded image is a lengthy process requiring resources
(graphic design talent and technological tools and adroitness) and
research (complying with the look and feel requirements of the SIUE
website as a whole). As it was, the decisions I presented before the
CONTENTdm committee were straightforward and simple. I made
some mockups featuring the school colors in various combinations in
the permitted areas; the voting process was fairly painless. I worked
with Academic Computing personnel to get an official logo that
conformed to CARLI's size constraints with the exact color specified
by SIUE marketing guidelines.
Challenges encountered during subsequent digital projects
My second digital collection experience, the digitized presentation
of a Civil War diary, was achieved in collaboration with the Social
Sciences Librarian and a temporary staff worker under her
supervision who had transcribed the entire diary. I learned how to use
the transcription function in CONTENTdm and worked with the staff
worker to render the transcription she had created into files that
CONTENTdm could manipulate, i.e. individual text files with file
names matching the corresponding image files.
In the spring 2009 semester, I was the instructor of record for a
student's Senior Project course. The student, who had worked in a
library for several years and was considering going to library school
after completing his bachelor's degree, wanted to learn about Dublin
Core metadata and digital collections. Together we designed a project
for him to create a Civil War collection under my supervision using
digitized letters and ephemera loaned to the Library by an emeritus
Entering the Digitization Universe 377
professor of history. After completing background readings and
papers, the student spent about five hours a week at the Library. He
collaborated with me to make metadata decisions and learned how to
use the CONTENTdm software, successfully completing the project in
a semester's time. Working on this collection revealed a shortcoming
of CONTENTdm in the way it supports managing metadata for
disparate types of materials within a single collection. This project,
which my student ultimately named the American Civil War
Collection, is comprised of three different types of digital entities:
letters, military orders, and songsheets. Adequately describing all
three required a total of 27 different metadata fields. CONTENTdm
does not have the functionality to organize metadata separately into
subsets determined by their relevancy to particular included objects.
Metadata manipulation (mapping, editing, etc.) after uploading is
performed in a single interface in which all the metadata fields are
displayed together. Fortunately in our case the small overall size of the
collection meant coping with this limitation wasn't prohibitively
awkward.
The same spring the library administration hired a second catalog
and metadata librarian, and together we began work on a fourth
digital collection. This project featured digital photographs of
architectural artifacts designed by architect Louis H. Sullivan and
owned by SIUE, accompanied by digitized historic photos of the
buildings on which the ornaments originally appeared. We worked
with the Fine Arts Librarian and her graduate assistant to plan the
organization and presentation of the images and identify the metadata
we wanted to include. The graduate assistant gathered the raw
metadata which my colleague and I translated into controlled
vocabularies. We used the Getty Art and Architectural Thesaurus
(AAT) (Getty, n.d.) for terminology for the ornaments themselves,
materials of construction, and types of buildings of origin,
supplementing the latter with Library of Congress Subject Headings
when we felt it would be helpful. We began populating the digital
collection in April, a process that took four months due primarily to
delays in obtaining some of the images and associated descriptions.
Leveraging CONTENTdm to create a meaningful structure for objects
378 Entering the Digitization Universe
in this collection proved challenging. We ultimately decided upon
what CONTENTdm calls a monograph structure. A CONTENTdm
monograph is a compound object with hierarchical levels, analogous
to chapters in a book. We organized each of our digital entities to have
two subsets (chapters) of images: artifact images and building images.
Users click on one of these headings to reveal the images in the next
hierarchical level. Although this structure isn't inherently intuitive, we
felt it was the best fit from among the options available in the
CONTENTdm software. The structure works well when browsing the
collection as a whole or via the index boxes we supplied for artifact
type and building of origin, but we are less enthusiastic with how it
translates into retrieval from keyword searches. CONTENTdm has
options for customizing the retrieval display that address some of our
concerns, but the fact that the software isolates document- and page-
level metadata in the search and display customization functionalities
prohibits us from achieving our ideal result.
In summary, the four digital collections that Lovejoy has created
to date using CONTENTdm are:
1. KM OX Popular Sheet Music , comprised of 118 objects and 558
jpg files.
2. William R. Townsend Civil War Diary, comprised of 14 objects
and 356 jpg files.
3. American Civil War Collection, comprised of 9 objects and 40
jpg files.
4. Louis H. Sullivan Ornaments, comprised of 64 objects and 191
jpg files.
The completion of each project was marked by announcing its
availability to the University community and adding a link to the
library website. I also created a catalog record in OCLC for each
collection, and added all four collections to the CONTENTdm
Collection of Collections database (CONTENTdm, n.d.).
Issues to be addressed in future projects
Organizational change has provided a source of external pressure
concurrent with and affecting the progress of our digital initiatives
and priorities. Lovejoy Library's dean left near the end of 2007 after a
Entering the Digitization Universe 379
long tenure as both Dean of Library and Information Services and
Associate Vice Chancellor for Information Technology. Academic
Computing had reported to the Dean in his latter capacity. Upon the
Dean's retirement, the Provost decided to change the organizational
structure so that the new library Dean would not have this dual
responsibility. Academic Computing merged with the Office of
Information Technology Services and now shares with it a new
reporting structure separate from library administration. The new
system began in July 2008. The interim Director of Technical Services
began exploring a team approach to digitization. His plan centered
around two major initiatives: purchasing a large format scanner for
the library and hiring a digital imaging specialist, which were
accomplished in 2008-2009. Library digitization projects could
consequently be created without relying on Academic Computing
personnel to scan materials. However, all of the aforementioned
projects were digitized by various people before the purchase of the
library's large format scanner and subsequent hiring of the Digital
Imaging Specialist. The team approach has not yet been developed for
producing CONTENTdm digital collections.
In fact, creation of the image files began prior to my involvement
with each of the projects except the first. As a result, a filenaming
convention was never established. We discussed filenaming for the
first project, the KMOX sheet music collection. We decided to use a
transparent method: The images were named using a combination of
the song title, composer name, date, and page number. An example is
ByTheLight_Edwards, Gus_1909_001.jpg. This approach doesn't
support generalization to future projects. As I researched the issue
further, I grew to prefer a more systematic approach to filenaming.
This idea inspired me to create an Image ID field in the metadata for
each sheet music title in the collection, which I populated with an
alphanumeric collection-specific accession number. But the actual file
names corresponding to the jpg files weren't included in the final
metadata: an inadvertent oversight resulting from my inexperience
with how CONTENTdm handles tab-delimited files. The problems
with file names persisted for all four of the projects described
previously in this chapter. The Digital Imaging Specialist is working
380 Entering the Digitization Universe
with me and the other catalog and metadata librarian to establish a
convention that works with our scanning equipment defaults. We have
decided to adopt a cross-collection systemized convention similar to
that described in the Wisconsin Heritage Online Digital Imaging
Guidelines:
File names for digital masters and derivatives need to be
established before the scanning process. Systematic file naming
helps not only to manage the project, but also ensures system
compatibility and interoperability. It is generally recommended to
assign an eight-character file name and a three-character
extension, e.g. aaoooooi.xxx. This is sometimes called 8.3. File
names should adhere to some general requirements. They should
be:
• Unique and consistent
• Alphanumeric (consist of only letters and numbers)
• Lowercase
• Free of spaces and tabs
• Numbered sequentially using leading zeroes (i.e. 001, 002,
003, not 1, 2, 3)
The files can be named after an original source collection or per
project, depending on the needs of the local institution. Up to four
letters can represent the project abbreviation or original collection
name, e.g. hf for Harrison Forman Collection or sccl for Shawano
City-County Library. The remaining digits indicate a unique file
number. This is often simply sequential numbers prefaced with
leading zeros. For example, digital images from the Harrison
Forman collection project are named hfoooooi, hfooooo2, etc.
(p. 5-6)
I reached the end of my digital collection backlog with the
completion of Lovejoy's fourth CONTENTdm-based collection in
September 2009. I subsequently met with some of my colleagues to
brainstorm ideas for additional digital collections. The result was a
fantastic array of proposals employing audio and video files, featuring
collaboration with other local institutions, and creating scholarly
research products on a digital platform. The proposals were presented
Entering the Digitization Universe 381
to the selection subcommittee. The constituency of the subcommittee
had been modified to reflect the organizational changes described
previously in this chapter. The Director of Academic Computing was
no longer a part of the subcommittee and the new Digital Imaging
Specialist had been added. Although the majority of the subcommittee
greeted the new project proposals with enthusiasm, the role of the
selection subcommittee is currently being reconsidered and thus the
project approval process is on hold.
We are planning to purchase a server in cooperation with our IT
department. Not only will this relieve severe storage problems during
digitization workflow, it will also give us the option to explore creating
portal pages to our CARLI digital collections or to host some
collections locally. The Digital Imaging Specialist has a graphic design
background and is highly interested in exploring creative ways to
showcase our collections.
Sorting out the process of green-lighting digital projects and the
graphic and technological design of locally-hosted portals and
collections will doubtless incur long and passionate discussion. The
committee-driven process that is the default for all decisions at
Lovejoy is not a painless one. Consensus-seeking, while attractive in
theory, is impractical in many ways. But it is the culture of this
institution and I suspect the culture of many similar institutions as
well. Some issues along the way to realizing our digital initiatives thus
far have been thoroughly discussed and resolved with thoughtful
regard for the future, and some were hastily addressed with the main
goal of overcoming a stalling impediment. Some of the best ideas
proved insufficiently nimble to adjust to unforeseen developments.
Some of the bad seed sown in the interests of forward motion has yet
to bear the anticipated troublesome crop. Regardless, we are moving
forward into new kinds of projects with a sharper focus on who we are
and how we want to present ourselves.
References
CONTENTdm Collection of Collections, (n.d.). Retrieved December 1,
2009, from http://collections.contentdmdemo.com/
382 Entering the Digitization Universe
Gertz, Janet. (2007). Preservation and selection for digitization.
Retrieved December 10, 2009, from
http : / / www.nedcc.org/ resources /leaflets / 6Reformatting/
o6PreservationAndSelection.php
The J. Paul Getty Trust, (n.d.). Art & Architectural Thesaurus Online.
Retrieved December 8, 2009, from http: //www. getty.
edu/ research/ conducting_research/vocabularies/aat/
Library of Congress Authorities. (2009). Retrieved December 1,
2009, from http://authorities.loc.gov/
Lovejoy Library. (2009). Digital rights and permissions. Retrieved
December 1, 2009, from http://www.siue.edu/lovejoylibrary/
about/digital_rights_and_permission.shtml
Wisconsin Heritage Online Digital Imaging Guidelines (Version 2)
(2009, September). Received December 2, 2009, from Wisconsin
Heritage Online Wisconsin Library Services.
From Argentina to Zambia: Capturing the
Digital A to Z's of a Child Art Collection
Kathleen C. Lonbom, Milner Library (Illinois State University)
Abstract
The International Collection of Child Art, residing at Illinois State
University's Milner Library, is a collection of artworks produced by
children and adolescents across a range of cultures and time periods,
primarily mid to late 20 th century. This chapter discusses the
collection's background and its role as a culturally expansive primary
source. Information is provided about the Library Services and
Technology Act grant funding awarded through the Illinois State
Library to support the digitization project, Imagine Illinois and
Beyond: Celebrating Creativity Through the Eyes of Our Children!
The benefits and challenges of the digital conversion are discussed.
Finally the chapter will look at alternate methods of image access,
specifically audio description, to facilitate information discovery for
viewers with a print disability such as vision impairment.
Keywords: Academic libraries, Art, Audio description, Children's
art, Cultural heritage materials-digitization, Descriptive metadata,
Grant funded projects, Illinois State Library, Institute of Museum and
Library Services, Library materials-digitization, Metadata, Special
collections, Visual resources.
383
384 From Argentina to Zambia
Collection Background
The International Collection of Child Art (ICCA) is a resource
comprised of artworks created by children and adolescents from
around the world. This cultural heritage collection reflects the visual
expressions of young artists who capture themes from the fantastical
to the familial and a myriad of themes that fall between. The collection
was initiated more than forty years ago at Illinois State University
(ISU), Normal, Illinois, and is now curated and administered by the
University's library. This primary source includes over 8,600
accessioned children's artworks, from approximately 58 countries and
cultures. The collection celebrates the creativity and innovative work
produced by children with a multicultural perspective. The resource
serves multiple audiences including students, academic scholars from
across disciplines, and a wider public interested in viewing, learning
about, and appreciating the imaginative vision that shapes a child's
creative pursuits.
ISU's Milner Library acquired the ICCA in 2000 from the ISU
School of Art, which inherited the resource following the closing of the
University Museum in 1991. The collection had been stored in a
classroom until the college could no longer accommodate it due to
space constraints, lack of support staff, and less than optimal
conditions for storing and maintaining this resource.
Virtual Shift: The Digital Project's Genesis
University libraries holding collections with parallel cultural and
historical value have grappled with similar challenges and
opportunities presented by the digital conversion of a unique
resource. Questions, both philosophical and practical, invite those
embarking on a digital project to entertain a range of considerations
perhaps not previously attached to the physical collection. Digitization
of the Eastern North Carolina Postcard Collection project members
recognized the ambiguous nature of assigning subject headings to a
stand-alone image, relatively free of contextual information. The
postcards in this collection usually were accompanied by text caption,
but did not always have context provided by a monograph (Dragon,
From Argentina to Zambia 385
2009) . Colorado State University's project to digitize the University
Historic Photographic Collection emphasized the collaborative
process of converting a historic resource under a controlled timeline.
The conversion involved a variety of project partners learning to speak
each other's professional language including archivists, metadata
librarians, and digital project managers (Hunter, Legg & Oehlerts,
2010) .). A Latin American political poster collection, part of the
University of New Mexico Libraries Center for Southwest Research
and Special Collections, speaks to digitally documenting the transient
nature of a resource by preserving ephemera such as posters. Similar
to children's art, posters are not typically created to last through time,
but document a specific and often meaningful moment in time
situated at the edge of societal mainstream perspective (Stephenson,
2006). Clifford Lynch's discussion of digitizing cultural heritage
materials comments broadly on the discovery aspect of placing
resources in an open electronic environment where unexpected and at
times serendipitous communities form around such a collection.
Lynch posits it is the objective of a digital library, expansive in its
capacity to "enable and facilitate implicit communication" to provide a
construct for the community building that develops around a
collection (Lynch, 2007).
The move to consider digitization of the ICCA was rooted in
curricular needs to accommodate a generation of users already vested
in electronic access. By virtue of the resource's sheer size, aging
condition, and location, physical access is limited. The collection of
two dimensional artworks is stored in a multi-use university
warehouse that also serves as the library's storage site for a collection
of less frequently used volumes and is the home of University
Archives. The warehouse is off campus and largely off limits to the
public. Although slides of the artworks were created in the 1970s to
support teaching and study of the collection, the analog format was no
longer a viable option to support use of the collection in the 21 st
century academic environment. Included in the collection are sixteen
thematic traveling exhibits that have been displayed widely and have
garnered attention for the resource. Several hundred of the slides,
selected from the collection's traveling exhibits, were converted to
386 From Argentina to Zambia
digital format in 2003, but overall the condition and quality of the
slides was questionable and each image required color correction. The
conversion of this small sample to a digital format brought attention
to the collection when the images were mounted on the ICCA website
and provided the spark that ignited the pursuit of funding to digitize
the collection. Art, art education, English, and children's literature
faculty actively using the collection in classes were supportive
advocates for moving the resource into a fully accessible digital
format.
Setting the stage for digital conversion in the fiscally constrained
environment of a public university requires strategic use of limited
funding for library resources and creative thinking to move forward
with a digitization project. Milner Library began researching grant
options to move forward with the digital conversion of the ICCA in
response to faculty feedback.
The value of a grant award is not limited to the funding awarded
to support a project. Significant gains are gleaned from a successful
external grant award including the opportunity to work outside your
own institution with the awarding agency, related publicity,
promotion, and recognition of a collection. The external recognition
has the potential to bring additional funding as a project's status is
heightened, providing a scaffolding effect on which to build and
enhance the project and its outcomes. Additionally, faculty librarians
working towards tenure and promotion are keenly aware of the
professional distinction associated with the potential of a funded
competitive external grant and the implications for building a record
of scholarly and creative activity.
Grant awards also provide resources to fund hiring student
assistants who collaborate and contribute to a project's success. In an
academic environment a grant funded library digitization project can
advance a student's academic career by affording opportunities to
work on the multiple facets afforded by such a project. Grant funded
projects, such as the proposed digitization of the ICCA, offer a wide
range of opportunities for students from a variety of academic
disciplines, testing and developing students' strength as a researcher
From Argentina to Zambia 387
in art, art history, history, foreign language and culture. Students who
have worked with the ICCA come from a variety of majors: art, art
history, arts technology, English, and language. A student can also
learn about time and project management skills and supporting the
goals and objectives of a project through their own contributions.
Writing a successful grant proposal reflects a commitment of
time, thought, and energy while providing an opportunity to compete
and be recognized through the external agency awarding the grants.
The Illinois State Library's (ISL) Library Services and Technology Act
(LSTA) grant program, using funding made possible by a grant from
the Institute of Museum and Library Services (IMLS), appeared to be
a fitting choice for a proposal submission. The IMLS is the main
provider of federal support for museums and libraries in the United
States working with state and local organizations, its primary mission
to strengthen the capacity for these institutions to link people,
information, and ideas in a meaningful way. The fiscal year 2009
LSTA grant offerings by the ISL were available in three categories:
technology programs and services, resource sharing, and digital
imaging projects. The digital imaging category suited the needs for the
potential ICCA digitization project.
The ISL's LSTA grant application and review process is
competitive and requires the principal investigator to make the case
for the value of the collection being considered for funding. Grant
reviewers— experienced with digitization projects, past principal
investigators, or project directors— evaluate proposals using a review
rubric. The rubric is made available to grant writers and offers clear
language about components of the grant application: project
description, action plan, target audience, outcomes, timetable,
personnel, and project sustainability (LSTA Grant Review Criteria,
2010). The ISL provides further support to grant writers by hosting
workshops designed specifically for digital imaging projects prior to
the grant proposal deadline. The workshops cover planning the grant
project, elements of the proposal, and the evaluation rubric.
Developing the grant proposal afforded an opportunity to closely
examine the ICCAs forty year history at ISU and to justify the
388 From Argentina to Zambia
significance of the collection in terms of how it related to historical
and cultural trends on a state, national and international level. The
resource, initiated by ISU's former University Museum Director, Dr.
F. Louis Hoover, began with a collection of over 300 artworks
gathered from the children of Illinois. Artworks from North American
countries represent 55% of the current collection. Although ultimately
international in scope, the seeds for the collection were sewn with
creative works by the children of Illinois. With support from the
University, the collection's scope grew under ISU art education
professor Dr. Barry Moore (now emeritus), including artworks from
children and adolescents of six continents, Antarctica being the only
continent not represented. Relationship building and networking with
international organizations such as the Christian Children's Fund
reflected the University's longstanding record of supporting global
engagement through activities including study abroad programs for
students and welcoming international students to the ISU community.
The LSTA grant application required detailed attention to
planning the ICCA digital conversion including proposed timelines,
personnel, and supporting resources. The project planning guidelines
suggested through the proposal process largely reflect the detailed
information for the digitization of cultural heritage materials
thoroughly outlined in the Digitization Activities Project Planning and
Management Outline document disseminated by the U.S. National
Archives and Records Administration (Still Image Working Group,
2009).
Milner Library's Digitization Center, established in 2005 to
primarily provide for the digitization needs of the University
community, proved the logical choice for the digitization of the ICCA.
The Center, equipped to digitize a variety of analog formats, provided
contractual services for successful projects funded by LSTA grants
awarded through the ISL: the Towanda History Project, a partnership
between the Towanda District Library and Historical Society, and the
Native American Collection, held at the McLean County Museum of
History. These projects established the Center's reputation for
partnering on LSTA grant funded digitization projects. Additionally,
Milner Library and ISU's School of Art work collaboratively with the
From Argentina to Zambia 389
Center to produce an image database, ILSTUDIA, providing images
for art, art history, design, and visual culture hosted on the library's
CONTENTdm server. Further experience was gained with the
digitization of two dimensional artworks when the Center digitized
more than 250 prints from the New Editions Workshop project
funded by the School of Art.
The grant proposal emphasized the collection's value to provide
primary source materials across disciplines including art, history,
social sciences, education, English and psychology. The educational
value of the collection was already established by the documented
study and use of the resource by researchers (ICCA: Research, 2008).
The digitized collection was not meant to replace the original
artworks, but to facilitate access to a larger audience. Titled Imagine
Illinois and Beyond: Celebrating Creativity Through the Eyes of Our
Children! the grant proposal was submitted in May 2008 to the ISL's
LSTA digital imaging grant program. In August the total amount
requested, $85,934.00 was awarded for the project to digitize 8,600
accessioned pieces in the ICCA. The grant cycle began October 1, 2008
with all funds to be expended by June 30 th 2009.
Constructing the Project
The original grant proposal was submitted, approved, and awarded,
allotting the majority of the funding for contractual services from the
Digitization Center at Milner Library. Although the grant proposal was
reviewed and approved at all required levels at ISU, the University's
Grants Accounting and Comptroller's Office requested a budget
amendment be filed after the grant was awarded, belatedly noting the
Digitization Center named for contractual services could not be
designated as a separate agency. An amendment to the grant budget
needed to be filed to reflect expenditures to support the project. The
budget amendment was submitted and approved by the University
and the State Library. It was an unexpected hurdle to cross and
created delays hiring grant funded personnel and ordering equipment.
Additional complications were created by the reorganization of the of
the Digitization Center, just weeks prior to the grant awards being
390 From Argentina to Zambia
announced. The director of the Digitization Center was reassigned to
work full time on a separate external grant funded project, essentially
leaving the ICCA project without a project director and the Center
without a director. Without prior notice or planned options for time
release from academic workload, the principal investigator took on the
additional responsibility of project director for the digitization project.
As author of the grant, the principal investigator had drafted the
project planning that was already in place and the funded grant
moved forward.
Developing the Data
When Milner Library began administering the ICCA, the resource
came with a textual database holding museum generated information,
as available, about the artworks including object identifiers, titles of
the pieces, date created, artist gender, artist age, artist location,
provenance, materials, dimensions and subject headings. The textual
database laid the groundwork for collaboratively developing metadata
schema, mapping, and standards used in the digital image database.
Due to personnel turnover and position reassignments three different
metadata librarians worked on the project during the planning phase
of the grant, through the grant cycle, and after the grant cycle came to
an end. The library supported filling the vacated positions so that
metadata development continued to progress.
Initially, guidelines for the LSTA grant required funded projects
to upload their digital files and related metadata into the Illinois
Digital Archives for dissemination. Milner Library was already using
CONTENTdm to manage several other digital collections, and it is the
choice of the Consortium of Academic and Research Libraries in
Illinois. CONTENTdm has proven to be a reliable data storage
solution for a variety of formats including images, documents, and
audio. This is not a small consideration for an institution, not only due
to cost, but also to preserving cultural heritage collections with
concerns for moving the digital data attached to these resources
reliably and robustly into the future (Van Den Bosch, Van Den Herik,
& Doorenbosch, 2009).
From Argentina to Zambia 391
Vocabulary for the subject headings was most closely aligned with
the Getty Art and Architecture Thesaurus (AAT). The project team
continues to add and adjust subject headings as needed using the ATT
as a guide, although not all terms are included in the thesaurus. For
example, the terms "angels" and "ghosts" are excluded in AAT but are
useful terms to apply in a children's art collection. Metadata for the
collection was scrutinized closely after the images and information
were matched and viewable simultaneously. Gaps in information were
revealed. Descriptive metadata, such as titles for the artworks, were
occasionally truncated due to the migration of data from one database
format to another. This was an irregular occurrence which was
remedied by consulting the original records or the artwork to
complete the information.
A related grant was written concurrently with the LSTA proposal
requesting funding from the University's Research Grant program to
support additional enhancement and development of descriptive
metadata fields for the collection. For example, as the collection grew,
accompanying artists' comments were documented on old key punch
cards, sometimes typed, sometimes handwritten, in the 1970s and
1980s by staff from the University Museum. The paper records, stored
with the collection in the offsite library storage facility, provided no
viable method of access for researchers. The awarded grant provided
funds to hire a student who reviewed all paper records and
transcribed more than 600 comments into a spreadsheet. Providing
access to accompanying artists' commentary in addition to the images
themselves provides researchers enhanced primary source material.
When paired with the image of the artwork, the accompanying
artist commentary enriches the descriptive metadata available to users
and also potentially provides illuminating context for the image
(Figure CHILD-i). The commentary from the young artists had not
been made fully available in an accessible format to users in the past.
This same grant also partially funded student assistants to transcribe
and translate a collection of post World War II Germany artworks
donated to the ICCA in the 1970s.
392 From Argentina to Zambia
Figure CHILD-1. To Live Underthe Sea, boy, 11 years old, Philippines, 1976.
© International Collection of Child Art, Milner Library, Illinois State University,
Normal, Illinois. Used with permission. Artist comments: "I wish to be friendly
to all fishes under the sea so I could see the beauty and surroundings under
the sea. I wish to stay forever under the sea. I will ride on the back of the
shark. I will help the poor people and I will give them pearls."
After the LSTA grant was fulfilled, a University Research Grant
was awarded to support an ongoing project to write concise— one to
two sentence— descriptions for the digital images from the ICCA.
Descriptions were a missing piece in the information provided by the
textual database. The accompanying descriptions will serve as an
additional path to the end user, enhancing access and discovery. By
Dublin Core definition, the description field provides a textual
description for an image, such as a piece of children's artwork in this
collection (Dublin Core, 2009). End-users, including students,
educators, and researchers, will benefit from having additional access
points available as supporting and valuable descriptive metadata for
the digital collection of multicultural children's artworks. Although
the majority of the 8,600 artworks converted under the LSTA
digitization project have at least one term in the subject field for
From Argentina to Zambia 393
descriptive metadata, no information is available in the description
field. The lack of information in this field limits the end-user's
capacity to fully explore the resource for the purposes of teaching,
learning, and research.
The library's digital imaging specialists captured images of the
artworks with the center's Betterlight Super 8K-HS camera working
on a Tarsia 40 x 60 inch vacuum table which accommodated most of
the pieces. Required specifications for scanning the artwork were
provided by the State Library and were compatible with other digital
projects the library had developed. The image files have an
uncompressed TIFF master file (300 PPI), compressed JPEG image
files (300 PPI), and a thumbnail GIF image (72 PPI). The digital
imaging specialists and project student assistants resized and edited
image files using Adobe Photoshop CS2.
The Road More and Less Traveled
The Digitization Center's location required the children's artwork to
be transported from the university warehouse approximately three
miles off campus to the library. Over the course of the grant cycle,
approximately forty weeks, weekly trips were made to move artwork
to the library and then return the artwork to the warehouse.
Despite being located in the Midwest, climate never played a
significant role in transporting the artwork, not a small consideration.
A vehicle was able to drive into the warehouse for pickups and
deposits to the library were delivered at a service door with a large
overhang. The two dimensional works were transported in portfolios
designed and crafted by the library's preservation staff. Each portfolio
held 25-50 pieces depending on the size and material from which the
artworks were made.
In the last months of the grant cycle the project team began
processing the traveling exhibits, about 500 artworks, framed and
stored in shipping crates. These pieces required extra time and
attention because many of the pieces had to be unframed, digitized,
and then reframed. The exhibits posed more of a challenge for
transporting back and forth because transportation had to be
394 From Argentina to Zambia
arranged in advance for University facilities staff to deliver the crates
to the library and then return them to the warehouse. The process of
transporting, uncrating, digitizing, reframing, general handling, and
re-crating the artworks was more time consuming than anticipated.
Library preservation staff worked with the project team from the
planning phase and throughout the project cycle helping to identify
workable solutions to related preservation and conservation issues.
The overall condition of the artwork in the collection is fair,
considering the age of the collection, some artworks more than fifty
years old, and the materials used for children's artwork, typically not
archival quality paper for example. A potential problem considered,
during the planning phase, was the possibility that some of the
artwork might have preservation or conservation concerns that would
need attention. Working collaboratively with the library's preservation
staff, the digitization staff handling the artwork created a rubric of
potential preservation concerns and documented observations made
about the condition of the artworks as each piece was handled that
was unusual or in need of attention. As a result, the collection now has
a record describing the condition of each piece that was marked for
conservation attention. For example, several hundred of the artworks
were in acidic mat board frames that necessitated removal as the
pieces were handled. Excessive amounts of tape made this a more
tedious, and time consuming, process than anticipated. Typical notes
documented included: some water damage, tacky oil paint, fragile
materials, acidic paper, tears, glue, tape etc. The digitization staff
documented the preservation concerns throughout the process and
consulted closely with preservation staff to make bridge fixes as
needed. This documentation will be used to pursue grant funding for
conservation and preservation of the artworks sometime in the future.
Miscellaneous Malfunctions and
Positive Project Outcomes
Unanticipated technical problems can and will happen throughout a
project period. Alternatively the positive outcomes provide an
overpowering counterbalance.
From Argentina to Zambia 395
Scanning came to a halt for several days just one month into the
LSTA project, when the Betterlight Super 8K-HS camera used to
capture the artwork malfunctioned and had to be shipped to
California for repair. The library's main server containing all of the
project image files went completely down toward the end of the grant
cycle resulting in six work days that proved difficult for processing,
digitizing, and scanning the artwork. The project team adapted, saving
files to the work station hard drives for temporary storage, until the
servers were restored. After a change in library network configuration,
the digitization center staff encountered an unexpected increase in the
amount of time it was taking to open a TIFF file from the server.
Moving image files between servers or from the server to a local hard
drive was slowed significantly.
Positive outcomes of the ICCA grant project and the increased
level of digitization activity directly and indirectly influenced and
contributed to changes in the Digitization Center. Repeated bending
and lifting at the low level of the pre-fabricated scanning bed created
ergonomic concerns. A frame to raise the scanner bed was designed
and fabricated, raising the scanning bed to 30 inches thus improving
the ergonomics of the setup and creating a more comfortable
workflow. The library committed to supporting additional space by
moving equipment and personnel from a cramped, shared area (which
staff had uncomplainingly made functional) to a larger work space
dedicated solely to the needs of the Digitization Center. The move also
solved the slowed time to manipulate image files as the network ports
and servers were upgraded from one hundred megabyte Ethernet to
one gigabyte Ethernet in the renovated space. Staff increased as a
second digital imaging specialist hired under the LSTA grant
continued employment full time after the grant cycle ended. Two ISU
students from the arts technology program were hired to work part
time on the LSTA project. They assisted with scanning artwork,
resizing images, and helped transport the artwork from the warehouse
to the digitization center.
396 From Argentina to Zambia
Areas of Growth
In 2007 Illinois Public Act 095-0307 altered the fabric of State
awarded digitization grants. The newly enacted Illinois Information
Technology Accessibility Act (IITAA) requires the preparation of
audio descriptions to accompany digital images to enhance access for
individuals with a print disability such as vision impairment. Due to
the incongruous timeline of the fiscal year 2009 grant awards and the
enactment of the IITAA, FY09 grant recipients, Milner Library
included, were only asked to submit audio files to accompany twelve
digital images. Funded applications henceforth are required to
prepare audio descriptions for 100% of digital images. After attending
an audio description (AD) workshop hosted by the Alliance Library
System, the project director became intrigued by the concept of
providing an alternate source of access to a visual resource. A small
grant from the University was secured and provided seed money to
hire a student assistant and begin researching the implications and
mechanics of audio describing a digital image collection. Using the
ICCA digital collection as a beta model, a pilot project has been
initiated by the project director with the primary objective to
investigate the challenging aspects of developing audio descriptions
for an image collection.
Audio description is defined as a "narrative technique that makes
visual images more accessible to blind and low-vision people by
producing audible written descriptions of non-verbal visual
information" such as the digital images in the ICCA. The history of AD
is rooted in the performing arts and dates back to the 1960s when a
vision impaired employee from the United States Department of
Education suggested preparing audio descriptions for films as well as
the captioning already being provided for hearing impaired
individuals. The idea was not supported however until the 1980s when
advocates gathered and initiated an ongoing program sponsored by
the Washington Ear, a group recording newspaper and magazine
articles for the vision impaired and continued by providing
descriptions for performing arts venues (Snyder, 2008).
From Argentina to Zambia 397
Employing basic principles for preparing AD suggested by the
workshop, work began on writing transcripts for a selected number of
images from the ICCA. The transcripts are brief, providing up to a one
to two minute description averaging 200 words. Descriptive language,
not analysis, attempts to provide a listener with a concise
understanding of the image. The basic elements for the description
use an introduction to the image, identify the primary theme or
elements, and then describe in detail what is featured in the image
(Figure CHILD-2).
Figure CHILD-2. A Beautiful Afternoon with Birds, boy, 12 years old,
Vietnam, 1972. © International Collection of Child Art, Milner Library, Illinois
State University, Normal, Illinois. Used with permission.
Audio Transcript: This artwork, a painting titled A Beautiful Afternoon with
Birds Returning to Nests was created by a 12-year-old boy from Vietnam in
1972. The centerpiece of this painting — composed primarily with blue hues —
398 From Argentina to Zambia
is two birds flying across a lightly speckled sapphire blue sky. A larger bird is
painted in the upper left side of the artwork closely followed by a smaller and
similarly painted bird. The beaks have only been suggested by a jagged line
and the visible eye of each bird is a simple white dot. The avian bodies are
sleek and graceful with plumage that subtly moves from light to deep
turquoise. The birds blend closely with their inky blue-black shadows
emphasizing their forked tail feathers, reminiscent of long fluttering coat-tails.
This image is published by Milner Library at Illinois State University as part of
the International Collection of Child Art Digital Collection.
Approaches to uncovering resources and options for developing,
writing, and recording narratives for images, grant funding, and
working with units across the university are being explored. An option
being initiated is building collaborative partnerships with teaching
faculty to integrate the creation of audio description for digital images
in the child art collection, into course curricula in related disciplines.
The outcome of the pilot project research, still in the beginning stages,
will inform future digital imaging initiatives integrating audio
descriptions with digital images. The children's artwork presents a
captivating invitation to construct an accessible and meaningful
representation of an image with language.
Conclusion
When discussion about digitizing the ICCA began the motivating drive
behind the idea was to provide improved access to support teaching,
learning, and research. With approximately 97% of the collection
digitized at the time of this writing, the ultimate objective of the
project has been fulfilled, although outcomes have expanded and were
influenced by a fluctuating climate of change that involved personnel,
equipment, and content (ICCA Digital Collection, 2009). Because
change is a constant in this environment a key component of a
digitization project is a creative, cohesive, and flexible project team
invested in the project. It is impossible to deny the overarching appeal
this collection of children's art extends to all, including the staff and
faculty across multiple library units who contributed to and supported
the project. From a pedagogical perspective, the project provided an
opportunity to engage numerous ISU students who contributed their
From Argentina to Zambia 399
time and talents to multiple aspects of the process including collection
maintenance, digitization skills, translation of artworks, conservation
work, and transcription of metadata. The students who worked with
the collection were thrilled to have the experience of working with
primary source materials and were quite excited to be a part of the
process that brought the collection to digital life. In the academic
environment, and especially at ISU, which prides itself at placing the
learner at the heart of teaching and scholarship, the level of student
engagement was an unanticipated and truly positive outcome.
The project will continue with the ongoing refinement and
enhancement of descriptive metadata, including expanded subject
headings and description development. As the collection's identity
shifts to the virtual, the responsibility to assure optimum access and
searchability for continued use and research of this resource will
remain at the forefront of its mission. Broadening the scope of access
for a range of users by providing audio access will continue to be
explored as the project moves into classrooms and provides
collaborative opportunities for students to continue to contribute in
meaningful ways to the project. The library will consider the forward
direction other cultural heritage collections are moving in the virtual
world (Ronchi, 2009). Embracing hypermedia to push the boundaries
of the collection through virtual storytelling, gallery exhibits, and
reconstructing the creative, historic, and cultural environment of a
child artist are all possibilities. Ultimately this will serve the goal of
remaining a meaningful and vibrant teaching and learning collection
that will continue to inspire users researching the visual expressions
of children.
References
Audio description Illinois. (2009). Retrieved March 13, 2010 from
http://www.alsaudioillinois.net/
Best practices for creating digital files. (2010). Retrieved March 9,
2010 from http://www.cyberdriveillinois.com/departments/
library/ what_we_do/ servicestechnologygrant.html
400 From Argentina to Zambia
Dragon, P. M. (2009). Name authority control in local digitization
projects and the Eastern North Carolina postcard collection.
Library Resources and Technical Services. 53 (3), 185-196.
Dublin core metadata initiatives; Dublin Core metadata element set,
version 1.0: Reference Description. (2009). Retrieved March 15,
2010 from http://dublincore.org/documents/1998/09/dces/
Educating Illinois 2008-2014: core values. (2008) Retrieved March
22, 2010 from http://www.educatingillinois.ilstu.edu/
plan_sections/core_values.shtml
Hunter, N.C., Legg, K, & Oehlerts, B. (2010). Two librarians, an
archivist, and 13, 000 images: Collaborating to build a digital
collection. Library Quarterly, 8 (1), 81-103.
Illinois digital archives guidelines for images. (2010) Retrieved
March 14, 2010 from http://www.idaillinois.0rg/cdm4/guidelines/
Illinois general assembly. (2007). Illinois information technology
accessibility act. Retrieved on March 14, 2010 from
http://ilga.gov/legislation/publicacts/95/PDF/095-0307.pdf
Illinois state library: Library services technology act review criteria
and rubric, (n.d.). Retrieved on March 22, 2010, from
http:/ /www.cyberdriveillinois.com/departments/library/what_we
_do/pdfs/lstaio_rubric.pdf
Institute of museum and library services: our mission, (n.d.).
Retrieved on March 22, 2010, from
http:/ /www.imls.gov/ about/ about.shtm
International collection of child art digital collection. (2009).
Retrieved on March 21, 2010, from
http://tempest.lib.ilstu.edu/index_icca.php
International collection of child art: research. (2008). Retrieved on
March 20, 2010 from http://www.library.ilstu.edu/icca/research/
Lynch, C. (2007). Digital collections, digital libraries, and the
digitization of cultural heritage information. In D.Kresh, (Ed.) The
Whole Digital Library Handbook. Chicago: American Library
Association.
From Argentina to Zambia 401
Stephenson, N. K. (2006). Preserving dissent: The Sam L. Slick
collection of Latin American and Iberian posters. Art
Documentation. 2 (1), 20-24.
Van Den Bosch, A., Van Den Herik, J., & Doorenbosch, P. (2009)
Digital Discoveries in Museums, Libraries, and Archives: Computer
Science Meets Cultural Heritage. Interdisciplinary Science
Reviews, 34, (2/3) 129-138. doi 10.1179/174327909X441063
Ronchi, A.M. (2009). eCulture: Cultural content in the digital age.
New York: Springer.
Snyder, J. (2008). Audio description, the visual made verbal. In J.
Cintas (Ed.), The didactics of audiovisual translation (192-197).
Philadelphia, PA: John Benjamins Publishing Company.
Still Image Working Group. (2009). Digitization activities project
planning and management outline. U.S. National Archives and
Records Administration. Retrieved March 12, 2010, from
http://www.digitizationguidelines.gov/stillimages/documents/
Planning.html
Acknowledgments
Milner Library ICCA Digitization Project Team: Sara Caldwell,
Digitization Center; Jim Caselton, Facilities; Anita Foster,
Bibliographic Services; Ross Griffiths, Preservation; Erica Holden,
Digitization Center; Christina Horna, Systems; Krena Hoyt, Systems;
Leta Janssen, Administration; Heather Kosur, Preservation; Jason
Paul, Systems; Patrice Andre Prud'homme, Bibliographic Services;
Toni Tucker, Administration; Sharon Wetzel, Administration
Milner Library student research and digitization project
assistants: Daniel Abdalla, Stephanie Finch, Sheila Majumdar, Peter
Nelson, Gina Pantone, R.J. Tortoriello
Additional appreciations to Cheryl Asper Elzy, former Dean of
University Libraries, Dr. Richard Satchwell, former Director,
Digitization Center, and Alyce Scott, Digital Imaging Program
Manager, Illinois State Library.
Special Collections, Digitization, and the
Classroom: A New Model
Mark Phillipson (Columbia Center for New
Media Teaching and Learning)
Michael Ryan (Columbia University Rare
Books and Manuscripts Library)
Abstract
The Black Radical Archive is a small pilot project at Columbia
University that leveraged digitization to involve students in the
discovery and description of heretofore hidden collections
(http://blackradicalarchive.ccnmtl.columbia.edu/). This project is the
result of innovative collaboration between archivists, educational
technologists, a faculty member deeply engaged with archival
collections, and his students— who made selections of their own and
added to the digital archive. In this chapter we describe cross-
divisional support for the project, its implementation in a Spring 2009
seminar entitled "Black Radicalism and the Archive," and lessons
learned from the informal, just-in-time digitization intrinsic to course-
driven repository building.
Keywords: Activism, Activists, Archives, Customization, Digital
repository, Teaching, Drupal, Hidden collections, Informal
digitization, Learning, Multimedia, Pedagogy, Processing, Repository,
Special collections, Support, User-contributed.
Introduction
Special collections in many ways define the character of an academic
library. They are rich and sometimes undiscovered islands of unique
402
Special Collections, Digitization, and the Classroom 403
materials amid an ocean of more generally available information
resources.
Locally held archival materials help define the identity and
character of their parent institutions, attracting and nurturing
research affiliations and communities. Faculty members who have
actively benefited from direct access to such materials in their own
research are looking for practical ways to involve their students in the
experience and excitement of working directly with primary source
materials.
In this chapter, we discuss a pilot project at Columbia University,
the Black Radical Archive (2009), that sought to integrate special
collections into the classroom — redefining some traditional notions
of the academic archive and its use. This project supported a Spring
2009 graduate seminar's focus on three special collections held by the
Columbia University's Rare Books and Manuscript Library (RBML). 1
It facilitated a deeper level of materials-based discovery and research,
providing students the means to discover, categorize, annotate,
digitize, and share important holdings in the collections. In the
process, the visibility of the holdings and archival practice was raised,
along with further prospects for drawing on archives to support
teaching and learning.
Though digital surrogates are sometimes faulted as a replacement
for tactile contact with original material, the Black Radical Archive in
fact leveraged digitization to encourage hands-on engagement. At the
height of the project's implementation, the Black Radical Archive
shifted in function from a repository of digitized items to a hub for
uploads of additional items discovered and digitized by students. The
project, then, emphasized the importance of physically inspecting
unique materials, even as it cultivated the advantages that digital
surrogates offer for convenient and repeated inspection, communal
access, and non-invasive annotation.
1 RBML's holdings and activities may be seen online at
http : / / www. columbia.edu / cu /lweb / indiv/rbml / .
404 Special Collections, Digitization, and the Classroom
Exposing the Collections
When he conceived of a graduate-level seminar exploring the
intersections of activism and archive called "Black Radicalism and the
Archive," Brent Edwards, Professor of English and Comparative
Literature at Columbia, tailored it to three collections held by RBML:
the Hubert H. Harrison Papers, a fully cataloged collection with a
published finding aid; the C.L.R. James Papers, a collection of
heretofore disparate collections that was in the process of being
assembled as the seminar met; and the Amiri Baraka Papers, a large
unprocessed collection with only a preliminary carton survey. Despite
these different levels of cataloging and indexing, Edwards wanted his
students to discover, categorize, annotate, and compare material from
all three collections— developing primary materials-based research
skills in the process.
Edwards's course occurred at an opportune time for both RBML
and the students in the seminar. Special collections librarianship has
changed markedly in its emphases during the past decade or more.
Without losing its curatorial, collections-based focus, the field has
invested heavily in promoting outreach and in directly supporting the
work of teaching and learning on campuses. 2 Special collections units
are now more open to a diverse constituency than they used to be,
more concerned with creating a broad base of users. Outreach
librarians are now common in special collections units, as are class
and seminar rooms. As library staff collaborate more with faculty on
course design, primary source materials are increasingly integrated
into undergraduate and graduate courses. At Columbia and many
other institutions, curators actively reach out to faculty for the
purposes of better integrating source materials into curricula; RBML
hosts two or three classes per day during the academic term, each of
which use rare books, manuscripts, documents, and other materials
from the collections.
2 For background on these efforts, see Traister, D. (2003), Smith, S.
(2006), and Association of Research Libraries (2008).
Special Collections, Digitization, and the Classroom 405
As a result of such outreach efforts, archival materials are better
integrated into courses as enrichment (library staff presentations of
source materials being studied in a course in later editions and
formats), research consultation (in connection with assignments
requiring students to consult books or manuscript materials in special
collections), or surrogate access (digital facsimiles of selected special
collections material made available to students through course
management systems, or CMSs). When Professor Edwards
approached RBML about his Black Radicalism seminar, we assumed
that he would be interested in the classroom support that RBML is
now accustomed to providing: that is, a review of sources pertinent to
the course, a display of some of them, and some arrangement for
ongoing access to select materials in the RBML reading room or the
course CMS. We were wrong.
Professor Edwards was thinking outside the box — or, more
accurately, he was thinking deep inside the box, the archival storage
unit. Edwards was interested in providing students with access to
primary source materials before they were organized for use. He
wanted his students to confront and try to make sense of historical
objects that had not been rearranged by archivists. Of course it is
common practice in research libraries not to open collections for use
until they have been fully processed. But again, Professor Edwards
approached RBML at an opportune time. Like other large research
libraries, Columbia has its own formidable backlog of unprocessed
and underprocessed archival and manuscript collections. Addressing
these arrearages became an ARL priority in the late 1990s, and the
subsequent momentum to provide new energy and funding streams
resulted in the "Hidden Collections" initiative that continues to play a
vital role in our research libraries. 3 Echoing ARL recommendations to
"connect the exposure of hidden collections to ARL's strategic priority
for Research, Teaching, and Learning," (Special Collections Task Force
3 See, for example, Greene and Meissner (2005), and Association of
Research Libraries: Exposing Hidden Collections (2009).
406 Special Collections, Digitization, and the Classroom
Final Report, 2006), Professor Edwards asked us to challenge a long-
standing policy of processing a collection before opening it up for use.
Edwards, of course, was less interested in library policy than in
pedagogical opportunity. His syllabus framed variously (un)processed
materials as a chance for students in the seminar to make original
discoveries and contribute to a scholarly effort to describe the
collections: "Part of the unique challenge — and, hopefully, the
excitement — of this seminar is that the three collections we will be
working with are in different states of organization. At times,
especially with the Baraka Papers, we will be investigating boxes with
little or no information about what we might find in them. On the one
hand, this means that to a certain extent we will have the opportunity
to discuss the collecting practices (inchoate as they might sometimes
be) of these intellectuals themselves, as they gathered and stored a
range of materials over many years. On the other hand, this means
that, by noting the layers of material in a given box, by tracking
sources, by deciphering handwriting, by dating an artifact or
manuscript, by annotating and explicating, in the long run we will be
helping RBML in the effort to catalog and make available to future
researchers these enormous and currently unwieldy collections. "
For Edwards's purposes, each collection was ideal for the sort of
practical engagement and various theoretical issues attending the
formation of archives that he had in mind. The C. L R. James Papers
were really an artificially constructed assemblage of materials by and
relating to James that had been harvested from a variety of sources in
North America and the UK. One of the most important social and
political philosophers and activists in the 20 th century, James led a
peripatetic life. He was careless about his personal effects, choosing
instead to put his energies into philosophy, literature, and global
politics. Friends and acolytes salvaged what they could along the way,
shaping his legacy in this way, and those efforts resulted in what
forms the core of the James Papers at Columbia. At the other extreme,
Amiri Baraka has lived in the same house in Newark NJ for more than
fifty years. He is also an inveterate collector, harvester, and saver. But
he is not a disciplined or organized one, and so his papers, while
Special Collections, Digitization, and the Classroom 407
voluminous, reflect an idiosyncratic self-archiving that could be called
comprehensive, organic, or simply disorganized.
Black Radicalism and the Archive, then, would be an ambitious
and wide-ranging course. It would focus students on the material
aspects of archives that document activist movements, as well as on a
larger set of theoretical concerns entailing the formation of political
and cultural identities. Working with the class on this level was a new
and exhilarating task; the challenge was to derive from material
archives the making of a broader movement. As Edwards planned his
course and conceived of readings and assignments trained on these
materials, it became clear that the collective research he had in mind
would require its own unique and equally ambitious support. At the
center of that support would be a digital workspace, allowing him and
his students to select, organize, share, and appraise each other's
findings from three very different collections.
A New Model of Support
Once Professor Edwards conceived of a virtual space in which to
coordinate and analyze the range of materials that students in his
seminar would be considering, he turned to another division of
Columbia University Libraries' Information Services: the Columbia
Center for New Media Teaching and Learning (CCNMTL). CCNMTL's
mandate is to facilitate and advance teaching at Columbia through the
purposeful use of new media. Over its ten years of operation,
CCNMTL has grown into a service enterprise that supports over 4,000
instructors at Columbia University. 4 It was the natural group to help
design and implement what would become termed the Black Radical
Archive, a digital workspace for Edwards's seminar.
Serendipitously, CCNMTL's priorities had been evolving
congruently to RBML's; each group was exploring new ways to
incorporate collections into teaching and learning. In 2007 CCNMTL
4 More information about CCNMTL's services and projects is online at
http://ccnmtl.columbia.edu.
408 Special Collections, Digitization, and the Classroom
had launched a strategic initiative called Digital Bridges 5 specifically
devoted to the development of innovative connections between
curated collections and classroom-based study. Digital Bridges
projects have resulted in a variety of educational environments and
tools that draw on collections held by museums, public media
producers, documentary filmmakers, scientific laboratories, and
academic libraries. As it cultivates new ways to incorporate collections
into teaching and learning, CCNMTL is increasingly reliant on
collaborative relationships formed with Columbia University Libraries
units such as RBML.
Supporting Edwards's course, in fact, drove new levels of
coordination between RBML and CCNMTL. Though each division
digitizes materials as part of its service mandate, the purposes, time-
frames, and quality levels of such digitization vary to a great extent.
Digitization of source materials is, of course, intrinsic to CCNMTL's
work supporting multimedia instruction at Columbia, and yet much of
this digitization is conducted 'just in time' within the context of an
individual semester or project; it does not conform to archival and
preservation standards. At the same time, RBML conducts a robust
digital program of its own. Partnering with Columbia's Libraries
Digital Program Department (LDPD) and the Preservation and
Reformatting Department (PRD), RBML's digitization is performed to
high quality standards and often tied to extracurricular exhibitions
and events. Setting up and stocking the Black Radical Archive, then,
entailed coordinating these various digitization practices across
CCNMTL, RBML, and units in the library supporting preservation-
level digitization.
Shortly before the semester met, Edwards worked with RBML
staff to identify items that he wanted to make available by dates
pegged to his syllabus. The standard workflow for digital exhibition of
RBML materials involves selection by curators, digitization to
s Background on Digital Bridges and description of various projects
developed by this initiative is available at http://ccnmtl.columbia.edu/
digitalbridges.
Special Collections, Digitization, and the Classroom 409
preservation standards by PRD, and design and presentation of
derivative versions of the content by LDPD. But in this case, the
tighter deadline was more typical of CCNMTL's course-specific
digitization or an e-reserves process than a standard high-quality
digitization workflow; Edwards needed a turnaround of materials
scanned and posted to the digital archive in as soon as one week, in
some instances. This compressed timetable was partly due to the
unprocessed condition of some of the items: it took Edwards and
RMBL staff some time to conduct an informal survey of uncataloged
materials that would be pertinent to the course.
When there was enough lead time, Edwards's initial selections
from the collections were processed in the traditional way: digitized in
the libraries' preservation lab to archival standards, with a derivative
version passed on to CCNMTL to post to the Black Radical Archive. In
many cases, though, the pressing deadlines of Edwards's syllabus
meant that shortcuts and workarounds were necessary. Edwards's
engagement was crucial here: whenever necessary, he was ready to
spend time in the collections himself, taking digital pictures for the
Black Radical Archive. He was thus able to give his class online access
to items from collections whose traditional digitization would have
taken a long time - if it were even attempted. For example, Edwards
was interested in a series of scrapbooks assembled by Hubert
Harrison; these scrapbooks helped underline the course's emphasis on
this activist's quirky and suggestive arrangement of materials.
However well-suited such scrapbooks were for the course, archival-
level digitization of such complex objects is a daunting, labor-
intensive prospect, necessitating correlation of page components and
layers. Assisted by RBML curators, Edwards conducted his own,
decidedly non-archival imaging of these scrapbooks.
While this survey was taking place, CCNMTL was also consulting
with Edwards to set up the Black Radical Archive, the website in
which digital surrogates generated for- and eventually by- the class
would collect. As a digital hub supporting student investigation of
these collections, the Black Radical Archive was structured to help
students quickly find as well as contribute items. Selecting an easily
modifiable platform, CCNMTL built the archive using a lightly
410 Special Collections, Digitization, and the Classroom
skinned instance of the open source Drupal content management
system. Drupal allowed for quick set-up and modification of content
types, as well as suitable administrative access for the fifteen or so
students enrolled in the course, some of whom were visiting from
other institutions.
The website was set up and integrated with Columbia University's
authentication system shortly before the semester began. An alternate
form of registering was also appended to the site in order to
accommodate the visiting students. Authenticated students were
allowed to search and access all content posted in the archive, post
comments, browse and edit metadata, and upload files. A higher level
of administration was reserved for CCNMTL so that the archive could
be further designed, modules updated, and user roles managed. The
interface was set up so that students could filter records by collection,
scan all collection locations, and quickly browse assets by column
sorting of titles and media types.
To support locating and indexing items from the collections, and
the box descriptions that would be the seminar's capstone assignment,
CCNMTL agreed to structure the repository with a simple and
customized data model. In this model, every asset uploaded into the
archive had to be connected to a location. Usually these locations were
boxes (with a numbering convention defined for the class). Because
assets from outside the three collections might also be uploaded to the
digital archive, a generic "Outside" location category was also created.
Metadata for the assets was kept as simple as possible — far simpler
than typical library description — to help make the eventual student
input of new assets into the archives as frictionless as possible.
Shortly before the class met, CCNMTL worked closely with
Edwards to stock the archive with the many pictures he had shot of
scrapbooks and other material he had photographed in the Hubert
Harrison collection. Since some of these files were too large to be
directly uploaded into the Drupal, CCNMTL arranged supplementary
server space through the Libraries Digital Program (LDP) at
Columbia. This too was a first: LDP had never before stored files that
were not generated by professional staff at preservation-level
Special Collections, Digitization, and the Classroom 411
standards. 6 Since Edward's selections for the digital archive included
aging audio and video tapes of various formats, several other
variations on traditional digitization processes occurred during the
semester as RBML and CCNMTL staff continued to add assets to the
digital archive, using Edwards's syllabus as a roadmap. RBML sent old
audio tapes out to a third-party vendor; CCNMTL, more experienced
with video processing, digitized some VHS tapes and, in the instance
of some badly damaged tapes, also commissioned an outside vendor.
All of these assets, generated through various means, were uploaded
into the Black Radical Archive in time for students to access for
specific class sessions. The Black Radical Archive, then, grew as a
result of diverse support interactions and inputs, ranging from
archivists to the instructor to various preservation specialists to
educational technologists and, finally, the students in this unique
course.
Classroom Support and Implementation
Class sessions were all held in a conference room in RBML, and
supporting these sessions was a logistical challenge in its own right —
involving coordination of cartons of material, presentations by
students of their discoveries, and collective access to the Archive
during class discussions. Participation of RBML staff in the class was
critical. Since almost none of the students in the course had any
significant experience with or exposure to primary source materials
prior to the course, an RBML archivist who was assigned to
supporting the course gave the group as a whole a short primer in
archives and archival management, reviewing with them the
fundamentals of acquiring, surveying, processing, and providing
access to archival collections. In addition, she met with students
individually throughout the course to answer their questions and help
6 In moving in-house library digitization support towards more informal
ground, this project resonated with calls from Davidson (2009), Dooley
(2009), and many others to prioritize services for an expanding user base for
special collections; Dooley urges us to "digitize with abandon."
412 Special Collections, Digitization, and the Classroom
them with their assignments. In effect, she served as a TA without
portfolio. This as-needed support was provided in lieu of more formal
training of Edwards's students in archiving practice, which would
have overloaded a syllabus already rich with cultural theory and
historical background about the seminal activists being studied. The
goal of the course, after all, was to get students engaged with archival
practice as a point of entry into more theoretical study of the practices
and influences of Harrison, James, and Baraka.
Another component of class support was the accommodation of
important guest speakers. In addition to noted scholars and experts,
Baraka himself visited one session and explained his own "archival
techniques," thus providing a further layer of documentation and
information to add to the Black Radical Archive. Each of the guest
sessions was taped, and the tapes became part of the evolving
narrative of activism and archiving constructed on the foundations of
the actual primary source materials, affording further material for the
growing digital archive accompanying Edwards's course.
In the second half of the semester, Professor Edwards gave the
class instruction on uploading items into Black Radical Archive.
Students were then ready to undertake the culminating assignment:
each one was assigned a box in the unprocessed Amiri Baraka
collection and then asked to inventory the contents of their box. To do
so, they would consult with RBML archivists, take digital pictures of
particularly interesting items they discovered in their box, and upload
box inventories and item photographs into the digital repository.
Though Edwards felt that this assignment could yield some interesting
information for RBML, in the form of preliminary inventories of
uncatalogued boxes that were modeled on inventories of processed
collections, the goal was not to produce authoritative documentation.
Instead, students were asked to produce lists of contents, with as
much contextual information (such as the date of an event, or the
identity of a figure in a photograph) as they could ascertain. These
unofficial selections and annotations by students joined the more
professionally catalogued and digitized material already uploaded into
the Black Radical Archive.
Special Collections, Digitization, and the Classroom 413
Even though Edwards's seminar met regularly in a conference
room in RBML and could retrieve and display physical collections,
digital surrogates from the archive were often displayed instead of
actual objects during class. This was a matter of convenience and
efficiency; discussion could engage details from many objects and
could proceed in spontaneous directions without being bogged down
by physical retrieval and replacement of items. Some items were
simply too fragile and cumbersome for the students to pass around;
the digital archive facilitated quick pinpointing of specific parts of
these objects during discussion, without loss of the visual complexity
that would have been hard to perceive in, say, photocopies.
Digitization of a variety of old audio and video formats meant that the
class could collectively consider such multimedia items without
having at hand various playback instruments. The digital archive also
facilitated convenient distribution of long manuscripts in the
collections, which were typically printed out by individual students for
reference before class and brought in for group perusal.
Students later described the "contagious" energy of their own
uploading to the Black Radical Archive of discovered items from the
Baraka boxes: their digital staging of material that would then be
contextualized by findings in other boxes. This staging spurred
discussions about theoretical and practical considerations of
archiving. Items with particularly complex provenance gave rise to
particularly lively conversation during the seminar. For example, the
Baraka boxes included audio tapes with liner notes scrawled on them
by the likes of Nina Simone and Allen Ginsburg— tapes that may very
well have been owned by these other luminaries at some point.
Collective appraisal of such objects, facilitated by the Black
Radical Archive, helped students think about the complexities of
provenance, the layers of interventions in any archive by original
collectors and subsequent organizers of material, and the artificiality
of any one preserved state of a collection. One student ambitiously
documented each "layer" that he uncovered in his box as he dug
through it rather like an archaeologist, and shared these layers with
the rest of the class to spur discussion of patterns of proximity across
the Baraka collection— patterns that a classmate went so far as to term
414 Special Collections, Digitization, and the Classroom
the "poetics of the archive." More simply, in-class consideration of
items uploaded by students helped them compare contents of their
assigned box to those in other boxes, and therefore gain various
chronological, thematic, and material views of the unprocessed
collection.
Participation in the course not only immersed Edwards's
graduate students in the practice of archival research, it also suggested
to them the usefulness of digital surrogates as a component of such
scholarly practice. One student discussed the way that a preliminary
inventory of items scanned by the instructor and classmates helped
him to identify patterns and target investigation of specific objects in
the archive — in short, to make more efficient use of his time with the
physical collections. Another student was prompted to think about the
benefits of informal, collaborative digitization for research in a subject
area she is researching, modern Egypt; items of interest for this topic
are as likely to be discovered on the likes of eBay as in a formal
archival collection, and a digital hub, this student speculated after
working with the Black Radical Archive, could help scholars
distributed around the world coordinate findings.
Though in-class use of the Black Radical Archive was heavy, very
few annotations of individual objects were posted on the site, despite
the site's capacity to connect annotations to uploaded items. This was
partly due to the structure of assignments in the class; by the time
students were engaged in digitizing items from the unprocessed
Baraka collection and comparing the contents of their assigned boxes,
the semester was drawing to a close; annotation of various objects
happened verbally as a matter of class discussion. In addition to a
main projector and screen, several individual laptops anchored
discussion with reference to various items in the digital archive. Class
discussion anchored theoretical readings to the items stocked by
Edwards in the Black Radical Archive and, as the semester proceeded,
it was increasingly concerned with targeted identification of
similarities and patterns across the collections. Student annotations
that were uploaded into the archive, then, tended to be appraisals of
groups of objects rather than of individual objects themselves.
Special Collections, Digitization, and the Classroom 415
As one of the few student who came into the course with previous
experience with archives stated, the digital archive served as a
"collective interpretive project," not a "collective digitization project."
Had the class's emphasis been on building an online library, rather
than on sharing an arena of interpretation, students would have
concentrated on assembling and posting deeper metadata. Certainly
the metadata in the Black Radical Archive is at present too sparse to
serve researchers outside the course; it is limited to item titles,
collection and box locations, and supplementary notes. Asking
students in a future class to enrich the digital archive with more
rigorous and informative descriptive metadata could be an interesting
further educational use of the archive, should Edwards wish to further
develop it as a resource for other researchers, or it could be a task
assigned to a graduate student working with RBML archivists.
Edwards's students were intrigued by the idea that their selections
could help inform subsequent work with the three Columbia
collections, though they agreed that subsequent cataloging would be
necessary to make the digital archive useful to researchers beyond
their class. Their most substantial legacy posted to the Black Radical
Archive was, in fact, item-level description of unprocessed boxes,
descriptions that ran to notable length in some cases and proved
detailed enough to assist RBML in its eventual processing of the
Baraka Papers.
The Archive beyond the Class
The Black Radical Archive survives the course it was built for, not as a
comprehensive representation of the collections it draws on, but,
rather, as an expanding documentation of engagements with these
archives. It will thus raise awareness and interest in these rich
collections, and, we hope, entice more teachers, students, and
researchers into direct contact with archival holdings.
From the perspective of a special collections unit nested within a
research institution, the lessons learned were several and significant.
First, it challenged two major sets of operating policies and practices:
to wit, that unprocessed collections should not be made available prior
416 Special Collections, Digitization, and the Classroom
to processing, and that digitization projects should only by designed
and undertaken for fully processed materials with stable metadata.
Opening up unprocessed collections to students proved to be
pedagogically and archivally important. In giving students access to
primary sources prior to library mediation, the class allowed students
to participate in the literal making of the historical record. At the same
time, it generated descriptive information at a highly granular level
that will eventually be of great use to processing archivists. The
detailed box content lists provided by the students proved to be of a
high quality; given the fact that the students brought to their tasks a
depth of subject knowledge not typically available to a processing
archivist, their contributions to the future organization of these
collections will be important. Although it will not be possible to do this
across the board in the future, involving students and scholars in the
description of collections prior to their processing is an option we
should try to exploit on a more regular basis — whether or not it is tied
to an actual course.
Digitization projects in the Columbia Libraries are complex,
formal, and labor-intensive. Standardization across the system is
rigorous and necessary if the products are to be integrated into a
larger common pool of digital assets. "Scan once not twice" is dogma.
Such projects are thus developed in a team environment, subjected to
a demanding vetting process, and implemented in a highly controlled
fashion. These projects typically reflect some consensus within the
Libraries as to which materials would be of most use to researchers
when presented on the web. The assumption is that if we build it, the
research community will find it and use it.
The lesson we learned from the collaboration is that we need to
accommodate a more diverse set of needs for the generation of and
access to digitized material. Aside from the benefit of surfacing
previously hidden holdings, there may be good pedagogical reasons
for doing so. Digitizing for the Black Radical Archive was entirely
user-defined; RBML staff had no role in the selection process other
than facilitating access and offering advise in response to the course
instructor and students. The scans that resulted were of considerable
value to the course, and they will always be available in the future
Special Collections, Digitization, and the Classroom 417
should they need to be repurposed in some way. RBML has worked
with CCNMTL on many occasions in the past in contributing to web-
based course support. What was new here was that the content of that
support was driven by users, not librarians. This project was bottom-
up, and its success encourages us to look for future opportunities to
re- engineer RBML's workflow to dovetail with CCNMTL' s digital
facilitation of learning activities.
This type of project is especially staff- intensive and time-
consuming. Archivists assigned to the course had to invest
considerable time working with Edwards on selecting materials, with
other staff helping to process digitization, and with students in
Edwards's course using these collections on a weekly bases. Moreover,
they needed to attend every session of the course. CCNMTL staff had
to be in constant communication with Edwards, RBML, other library
units, and outside vendors to keep the digital archive coordinated to
an unfolding syllabus. In the age of "sustainability" and "less process,
more product," this customized support sins in many ways. But while
it is not a working model for day-in, day-out course support, it does
provide interesting options for boutique projects, and it serves to
remind us how badly we need the boutique along with the mainstream
and the generic. Research institutions are congeries of expertise, and
special collections units need actively engage with complementary
types of expertise —such as that cultivated by instructional technology
groups, as well as the faculty and students we all serve— as part of
their routine work.
Developing any course ab ovo is arduous and challenging; as
anyone who has ever taught knows. But now that it has been
developed and a model of it remains via the Black Radical Archive, the
course — and intradivisional relationships formed in support of it —
can be reused in various ways and for various purposes. We look
forward to trying similar approaches with other faculty using a
different menu of materials for different purposes. In the meantime,
informal digitization and annotation of unprocessed material
contributed to the forming of a unique engagement with our
collections, and our task going forward is to preserve and make
418 Special Collections, Digitization, and the Classroom
accessible this effort of interpretation as part of the Harrison, James,
and Baraka papers.
Special collections and educational technology units considering
these types of partnerships should be prepared to devote significant
staff resources to them. In our view, the effort is well worth it, since it
directly integrates unique holdings, faculty and students, and the
diverse capacities of support units into the heart of our institution's
academic mission. And that, after all, is why we are here.
The authors would like to thank Brent Edwards, Susan Hamson,
Laura Helton, Lea Osborne, Patricia Renfro, and Lytton Smith for
their assistance with this article.
References
Association of Research Libraries. (2006). Special collections task
force final status report, 2006. Retrieved March 29 from
http://www.arl.org/rtl/speccoll/spcolltf/statuso706.shtml
Association of Research Libraries. (2006). SP296 Public services in
special collections, November 2006. Retrieved March 29 from
http://www.arl.org/bm-doc/spec296web.pdf
Davision, S. (2009). If we build it, will they come? Strategies for
teaching and research with digital special collections. RBM, JO(i),
37-49-
Dooley, J. (2009). Ten Commandments for special collections
librarians in the digital age. RBM, 10 (1), 51-9.
Greene, M. And Messner, D. (2005). More product, less process:
Revamping traditional archival processing. American Archivist,
68(2), 208-263.
Prochaska, A. (2009). Digital special collections: The big picture.
RBM, 10(1), 13-24.
Smith, S. (2006). From "treasure room" to "school room": Special
collections and education. RBM, 7(1), 31-9.
Special Collections, Digitization, and the Classroom 419
Traister, D. (2003). Public services and outreach in rare book,
manuscript, and special collections libraries. Library Trends,
52(1), 87-108.
Part IV - One Plus One is
Greater Than Two:
Collaborative Projects
Digital Treasures: The Evolution of a
Digital Repository in Massachusetts
Dodie Gaudet (Central MA Regional Library System)
Kristi Chadwick (Central/Western MA Automated
Resource Sharing System)
Jan Resnick (Western MA Regional Library System)
Abstract
Digital Treasures is a digital library collection of the history of central
and western Massachusetts. It is a collaborative project among
Central/Western MA Automated Resource Sharing System (C/W
MARS), Central MA Regional Library System (CMRLS) and Western
MA Regional Library System (WMRLS). Initiated by C/W MARS in
2006, Digital Treasures began as a pilot program when C/W MARS
purchased equipment and software and set up a scanning lab at its
headquarters in Worcester. Currently Digital Treasures has 36
collections from libraries, with over 1,300 accessible images. C/W
MARS, CMRLS and WMRLS continue to collaborate on ways to bring
funding, selection guidance and metadata expertise to their member
libraries and bring access to the wealth of cultural history of the
Commonwealth.
Keywords: Central Massachusetts Regional Library System
(CMRLS), Central/ Western Massachusetts Automated Resource
Sharing (C/W MARS), Digital Barn Raising, Digital Treasures, Digital
Commonwealth, Western Massachusetts Regional Library System
(WMRLS)
423
424 The Evolution of a Digital Repository in Massachusetts
History & Background
In the early 1960s, three Regional Library Systems were formed in
Massachusetts to provide a variety of services directly to the public
libraries in the western, central, and eastern parts of the state. The
services were (and still are) provided at no cost since the Regions are
funded by the state. When automation began entering libraries some
20 years later, the WMRLS (Western Massachusetts Regional Library
System, 2010) and CMRLS (Central Massachusetts Regional Library
System, n.d.) were instrumental in helping to establish a fee-for-
service automated network available to all types of libraries named
Central/ Western Massachusetts Automated Resource Sharing or C/W
MARS (C/W MARS Inc, 2009).
In 1998, the Regional Library Systems became multi-type. All
public libraries are automatically members of a Massachusetts
Regional Library System (MRLS). All academic, school, and special
libraries are eligible for membership if they meet basic qualifications.
Any WMRLS or CMRLS member can join C/W MARS by paying the
membership fees. Not all libraries choose to join C/W MARS: some
are small and cannot afford the fees; others are larger, have their own
network staff and prefer a stand-alone Integrated Library System. The
number of libraries in WMRLS and CMRLS is larger than the number
in C/W MARS. As of March 1, 2010 membership in the three
organizations broke down as follows
CMRLS
WMRLS
C/W MARS
Academics
17
19
11
Publics
72
103
130
Schools
120
164
2
Specials
26
25
4
Total
235
311
147
With the imminent development of Digital Commonwealth, a
state-wide portal and digital repository, CMRLS and C/W MARS were
both interested in setting up a small lab so that libraries could begin
scanning photos, documents and other items which could also be
The Evolution of a Digital Repository in Massachusetts 425
discovered via the state-wide portal. A Regional Library System and
an automated network in the eastern part of the state had each
purchased a scanner and offered instruction on its use, but few
libraries took advantage of the opportunity. Considering this, C/W
MARS proposed that one of its tech-sawy staff members who had an
interest in digital photography be in charge of the lab and do the
actual scanning if local library staff supplied the metadata. CMRLS
had a cataloger on staff who could conduct workshops on Dublin Core
and be available for metadata consultation. Since the offices of the two
organizations are a mere seven miles apart, joining forces to support a
single repository made the most sense and Digital Treasures (see
Digital Treasure, 2009) was born.
The Scanning Lab
Due to some new memberships and some existing members
upgrading, C/W MARS had enough discretionary funds to purchase
hardware and software and create a new part-time position. The
Executive Hardware for the new lab consisted of an Epson Expression
10000XL flatbed scanner with an 11" x 17" bed. A new server had
recently been purchased for the network's regular activity and the old
server was used for Digital Treasures. After much research, OCLC's
CONTENTdm was chosen for the creation, storage and access to
digital materials. Anticipating the future growth of Digital Treasures
and the inclusion of text, OCR extensions were purchased at that time.
Other programs loaded to the digital lab workstation included Adobe
Photoshop CS2 for creation of derivative files, Adobe Acrobat for PDF
conversion, and Datacolor Spyder 3 for display color calibration.
The Pilot Project
The principals involved in the Pilot Project were Michael Bennett, the
new part-time Digital Initiatives Librarian at C/W MARS; Dodie
Gaudet, Consultant for Bibliographic and Technical Services at
CMRLS; and Jan Resnick, Assistant Regional Administrator for
WMRLS. Dodie and Jan worked directly with the libraries in their
respective Regions; Michael, who had previously attended the
426 The Evolution of a Digital Repository in Massachusetts
Northeast Document Conservation Center's (NEDCC) School for
Scanning, did all of the scanning and was available to answer the
technical questions.
First some ground rules were established for the project. It was
decided that the subjects of Industry and Agriculture would allow for
maximum participation yet give the repository a focus. Concentrating
on pre- 1923 objects would avoid copyright issues. Each library would
be limited to 20 scans. Materials were to be two-dimensional, no
larger than 11" x 17", and in good enough condition to be handled
without falling apart. Once library staff had created the metadata, they
would be responsible for transporting their items to and from the C/W
MARS headquarters in Worcester.
Information sessions were held in each Region to promote
Digital Treasures and explain the Pilot Project. Next, Michael, Dodie
and Jan developed an online questionnaire which was sent to the
directors of all Regional member libraries. Directors were asked if
they were interested in participating in Digital Treasures, how much
material they owned, what kinds of materials they owned, how much
they ultimately planned to contribute to Digital Treasures and related
questions.
As the cataloger, it was Dodie's responsibility to learn Dublin
Core and develop a two-hour workshop. In preparation for the project,
she attended Introduction to Digitization offered by NEDCC in
October of 2005 and their Persistence of Memory Conference in
November of the same year. Using the Dublin Core Metadata
Initiative (DCMI) site (Dublin Core Metadata Initiative, 2008) and the
Bibliographic Center for Research (then Collaborative Digitization
Program) Best Practices Document (CDP Metadata Working Group,
2006) she synthesized material on Dublin Core for Digital Treasures
participants.
Simple Dublin Core was used so as not to overwhelm the staff
creating the metadata since most of them were not catalogers. The
handouts are available on the Digital Treasures site. Workshops were
scheduled in both the Western and Central Regions. Participants
The Evolution of a Digital Repository in Massachusetts 427
brought examples of the objects they planned to have scanned and
practiced their Dublin Core descriptions on paper forms.
The first library to complete the metadata for 20 images was the
Jacob Edwards Library in Southbridge. The descriptions were keyed
into an online form and e-mailed to Michael at C/W MARS. The
photographs were delivered to C/W MARS for scanning and matched
to the metadata. By the end of the Pilot Project in August 2006, 24
libraries, twelve from each Region, each had a digital collection of 20
images on Digital Treasures. Libraries also received a CD of their
images to use for publicity or any other purpose they chose.
To allow the repository reach beyond the Commonwealth's
borders, Digital Treasures metadata have always been compatible with
the OAI standard (Open Archives Initiative, n.d.). The metadata are
harvested both by the Digital Commonwealth and OCLC's OAIster
project.
When specifications were being developed for Dublin Core
records, the principals decided to use Library of Congress Subject
Headings so that the records would integrate well with the MARC
records in the online catalog. After the Pilot Project launched, Michael
began working with staff at OCLC, C/W MARS and member libraries
to crosswalk the Digital Treasures Dublin Core metadata into MARC
format that could be loaded into OCLC Worldcat. As a subscriber to
OCLC, C/W MARS would not only have their digital records available
online, but would be able to export the records into their own public
catalog with referring links back to the collections. Collections
available in October 2007 were harvested and their MARC records are
now available through WorldCat and the C/W MARS online public
catalog as well as Digital Treasures.
After the Pilot Project
Up to this point, all costs for Digital Treasures had been assumed by
C/W MARS, but the organization could not justify supporting this
service exclusively with membership fees. The decision was made for
C/W MARS to subsidize Digital Treasures, and charge participants a
portion of the costs involved in scanning and maintaining the digital
428 The Evolution of a Digital Repository in Massachusetts
repository. With a vote of the membership, the following fee schedule
was instituted and is still in effect:
All C/W MARS members are allowed 20 free scans. Each scan
beyond the first 20 is $8.50 (50% of the actual cost to digitize an
object). After scanning, the library pays $i.oo/image annual
maintenance fee. Libraries who are not C/W MARS members receive
no free scans, pay $i2.75/scan (or 75% of the actual cost) and
$i.50/image annual maintenance fee.
The annual fee is assessed for the maintenance of each digital
image's permanence over time through future platform migrations
and regularly scheduled data refreshment cycles. It also covers
maintenance of online delivery through CONTENTdm and the
allowance of OAI metadata harvesting.
Once the Pilot Project ended, activity in Digital Treasures slowed
down drastically. A couple of additional information sessions were
held along with another Dublin Core workshop. Everyone liked the
idea of having photographs and documents from their local history
collections digitized, but had difficulty finding the time to select
objects and describe them in a useful way. Even though the financial
situation was relatively healthy at the time, some libraries found they
could not justify the scanning fees. In the fall of 2007, Dodie applied
for and received a grant from the Greater Worcester Community
Foundation (GWCF) to cover scanning fees for libraries. The grant
was called the Central Mass. Memory Project.
More importantly, the grant from GWCF funded a metadata
specialist to work with the libraries selecting and describing their
objects for digitization. Jeff Monseau, a Library and Information
Science student at Simmons College who also worked part-time with
the Archives of Mount Holyoke College, was hired as a "circuit rider."
Because of the mission of GWCF, funds could only be used to help
libraries within the Central Region. For those libraries, Jeffs
experience, even the mere fact that he was available to help, was a
tremendous boon and several new libraries added 20 images while
other libraries increased the size of their digital collections. The grant
The Evolution of a Digital Repository in Massachusetts 429
funds enticed some non-C/W MARS members to participate in Digital
Treasures which broadened the collection.
In early 2008, Michael moved on to the University of Connecticut
and Kristi Chadwick became the administrator of Digital Treasures.
Digital Treasures on the Road
Digital Treasures had been up and running successfully for over two
years. Digital Commonwealth had developed a repository as well as a
portal and was publicizing its services, stimulating more interest in
digitization. For libraries that had not yet taken the step, the Technical
Services Section of the Massachusetts Library Association (MLA/TSS)
organized a one-day conference in the fall of 2008. The committee
chose the topic Introduction to Library Digitization and invited
several people from throughout New England to talk about their
projects, how the projects were started and to describe the practical,
day-to-day aspects. Kristi, Dodie and Jeff were asked to talk about
Digital Treasures, its technical details, and the Central Mass. Memory
Project.
A Digital Barn Raising
As one travels westward through the Commonwealth, the area
becomes more sparsely populated. Most towns have libraries, but the
libraries are often small and some are staffed by a single person. The
median population in western Massachusetts is 1900. Springfield, the
largest community, has over 150,000 residents, meaning some of the
other communities are significantly below the median. The towns are
also rather dispersed by New England standards, and there is little or
no public transportation. The western Massachusetts approach to
Digital Treasures - as with many projects - relies on team work.
In May 2008, Digital Commonwealth received a grant from the
H. W. Wilson Foundation for conversion efforts. The timing was
fortuitous as WMRLS was in the process of organizing a "digital barn
raising", a cooperative effort to get libraries started. To follow the
theme, everyone contributed. Kristi Chadwick brought an
introduction to Digital Treasures. A member of the Digital
430 The Evolution of a Digital Repository in Massachusetts
Commonwealth Board, contributed information about the Digital
Commonwealth and described the grants available from the Wilson
Foundation. A librarian from the Berkshire Athenaeum in Pittsfield
reviewed the major copyright issues involved in digital projects. Dodie
Gaudet presented the basics of Dublin Core and how libraries would
use it. Member libraries brought a few of their local documents. Jan
Resnick coordinated the program and arranged a few extra catalogers
to assist in the process.
The program consisted of presentations, questions and
discussion, and concluded with a work session in which participants
from member libraries began to develop Dublin Core metadata. This
structure worked well for the group. They had the opportunity to ask
questions (there were lots about copyright issues) and then had the
chance to develop initial Dublin Core records for their documents in a
supportive environment. WMRLS hopes to offer additional "barn
raisings" expanding the skill base to a wider group of small libraries.
Their collections contain a broad variety of treasures to be digitized.
Buckland Historical Society, which participated in the barn raising
session, went on to apply for and receive one of the Digital
Commonwealth's Wilson Grants. Collaborating with the Buckland
Public Library, they submitted over 50 objects to Digital Treasures.
The Erving Public Library also received a Wilson grant and
contributed 20 objects to the repository. The idea of the barn raising
generated so much interest that participants were asked to contribute
to a panel discussion at the day-long Digital Commonwealth
conference held in the spring of 2009.
Workflow
Libraries with an interest in having objects digitized contact Dodie,
Jan or Kristi. Items are limited to two-dimensional, 11" x 17"
maximum size for scanning, or audio files in MP3 format. The original
scope of agriculture or industry was lifted, so the theme of the
collection is up to the library. For libraries that are not familiar with
Dublin Core, Dodie assists with training, if based in the Central
The Evolution of a Digital Repository in Massachusetts 431
Region. Jan and Kristi assist libraries in the Western Region. This can
be done by e-mail, phone, or in-person consultation.
When the library has selected the objects they wish to add, Kristi
gives them access to the password-protected online form. This form
can be printed as a worksheet for initial descriptions, then entered
online and submitted to Kristi by e-mail. The library also receives a
copy for its own records. Once the library submits the first record for
review, then all metadata can be sent.
Once the metadata are received, an appointment is made to go to
the C/W MARS headquarters in Worcester, where the scanning lab is
located. Depending on the number of objects to be scanned, a two-
hour or more block of time is reserved for the library. If the collection
is new, preparation for the library's appointment will include setting
up folders on the CONTENTdm server and workstation and verifying
the administrative metadata, which are kept in a Microsoft Access
database.
When the library representative arrives with the images and/or
documents, all items are matched by the identifier assigned by the
library to the one listed in the metadata. Scanning on the Epson
printer is done with its native program. All images are scanned in
TIFF format for the "master" files, with a long side of 3000 pixels.
Resolution of TIFF files range between 300 to 600 dpi, to achieve
desired pixel dimensions. During the scanning, information from the
metadata workform is copied into the MS Access Administrative
Metadata database, including the title given to the object and the
identifier assigned by the library.
After initial scanning is complete, the TIFF files are manipulated
in Photoshop and converted into the derivatives. This includes
autoleveling and creation of JPEG files. Two separate JPEGs are
created, a full-size image and one at access size for CONTENTdm with
a long side of 600 pixels and 72 dpi. The full size JPEG is loaded to the
CONTENTdm server directly, and the access images are loaded
through the CONTENTdm workstation, where the rest of the
metadata are loaded in from the workforms then uploaded to the
server for web presentation.
432 The Evolution of a Digital Repository in Massachusetts
Stumbling Blocks
From the beginning, there has been some resistance from a few
libraries to the digitization of their collections. The reasons vary
widely. Some staff and trustees are concerned that by making
documents available digitally, people will no longer feel the need to
visit their libraries. Only experience will breach this barrier.
There is fear among a few people that digitizing library holdings
and making them available on the World Wide Web exposes a library's
materials to being used fraudulently. There is no evidence of this
happening, but those who feel this way are not easily convinced
otherwise.
The copyright issue of post-1923 material is one that concerns
library staff. Many photographs have been donated to libraries, and
their origins are unknown. As Digital Treasures expands and libraries
have more of their hidden collections scanned, this is an area that
needs more research, more specific direction on what can and cannot
be digitized and how to proceed if there is a complaint. In the
meantime, concern over possibly breaking the law limits what objects
libraries are willing to contribute to Digital Treasures.
The digitization process has developed into a "three-legged" stool,
comprised of Selection of Materials, Descriptive
Cataloging/ Development of Metadata, and Financial Resources.
Without support for all three, digitization does not happen. Carving
out time for staff to select objects for scanning has become more
difficult since Digital Treasures started. The world-wide financial
crisis quickly trickled down to libraries and many libraries have had to
lay off staff and reduce the working hours of others. Selecting
materials for inclusion in Digital Treasures is time-consuming even
when the local history collection is well organized and many are not.
There are fewer people who feel comfortable with original
cataloging. Even after objects have been chosen, it is easy to get
bogged down in the process of descriptive cataloging. Digital
Treasures participants have always been encouraged to create
thorough metadata so that their images are more easily found via
The Evolution of a Digital Repository in Massachusetts 433
keyword searches. Unfortunately, the process is overwhelming to
many, especially those who already feel time pressures.
No one thinks that the scanning fees, which are subsidized by
C/W MARS, are unreasonable. However, with tight budgets, every
penny is scrutinized. For a C/W MARS member library, adding 20
more images to its collection not only costs $170, but the annual
maintenance fee is increased by $20 - a commitment that some
libraries are unwilling to make. Financial incentives in the form of
grants can help, especially when the grants include an experienced
person to help with selection and metadata.
The Future of Digital Treasures
Currently the repository stands at 36 collections, with over 1300
items. The digital repository's contents are searchable through the
native Digital Treasures front end, the Digital Commonwealth portal,
OAIster, major Internet search engines such as Google and Bing,
WorldCat and the local C/W MARS online public catalogs. New digital
file formats, such as audio recordings, have been included in the
repository. The Digital Commonwealth presented a grant round in
2009 to small institutions in Massachusetts, allowing some libraries
to match funds to create new collections for the coming year. Another
grant round is in the works for 2010.
Knowing how much personal assistance with the selection and
metadata process can benefit libraries, C/W MARS has added the
Digital Treasures project to the University of Rhode Island (URI)
Graduate School of Library and Information Studies (GSLIS) Site
Experience database for a possible internship. While statistics show
that Digital Treasures is accessed, with nearly five million hits and
over 550,000 unique visitors in 2009, the success for the future lies in
continuing contributions and promotion of the repository.
References
C/W MARS Inc. (2009). Connecting libraries in Central and Western
Massachusetts. Retrieved on May 2, 2010 from
http://public.cwmars.org/
434 The Evolution of a Digital Repository in Massachusetts
CDP Metadata Working Group. (2006). Dublin Core metadata best
practices, version 2.2.1. Retrieved on March 31, 2010 from
http://vvrww.bcr.org/dps/cdp/best/dublin-core-bp.pdf
Central Massachusetts Regional Library System, (n.d.). Homepage.
Retrieved on May 2, 2010 from http://www.cmrls.org/.
Digital treasures: A Central & Western Massachusetts digital library
project. (2009). Retrieved on May 2, 2010 from
http : / / dlib . cwmar s . org/ .
Dublin Core Metadata Initiative. (2008). DCMI metadata terms.
Retrieved on March 31, 2010 from
http : / / dublincore. org/ documents / dcmi -terms /
Open Archives Initiative (n.d.). Standards for web content
interoperability. Retrieved on March 31, 2010 from
http : / / www. op enar chives . org/ .
Western Massachusetts Regional Library System. (2010). Western
Massachusetts Regional Library System - Libraries Cooperating
Communicating Sharing. Retrieved on May 10, 2010 from
http://www.wmrls.org/
Collaborative Digitization Goes Local
Ken Middleton, Mayo Taylor
(Middle Tennessee State University)
Abstract
This chapter describes our efforts to build a digitization program in
spite of limited funding. By obtaining small grants and forming
partnerships with other campus units and institutions in the Middle
Tennessee region, we have developed modest yet engaging collections
and planted the seeds for additional partnerships. The article will also
highlight the use of graduate students, efforts to streamline the
digitization process, and lessons learned.
Keywords: Collaboration, CONTENTdm, Collaborative digitization;
Oral histories; Work flow, Community projects.
Introduction
In an era when many libraries are struggling to stay relevant to today's
Facebook generation, digital projects have injected new purpose into
Middle Tennessee State University's Walker Library. Staff members
are excited about digitizing oral histories of pioneering women on
campus, documenting the agricultural history of a neighboring
county, or preserving seminal research on local rare plants. The digital
collections under development by the Library have required
collaboration with other campus entities and external institutions.
Few of the digitized objects have come from the Library's own
collections. In this chapter, we will describe the benefits of these
partnerships and how they can be nurtured. We will also explore
435
436 Collaborative Digitization Goes Local
many of the practical issues that new digitization programs face, such
as securing staff, streamlining the digitization process, handling
diverse file formats, and customizing collections.
Efforts to collaborate with area institutions began as a natural
extension of Walker Library's involvement in Volunteer Voices
(http://volunteervoices.org), Tennessee's statewide collaborative
digital project. MTSU librarian Ken Middleton served as the co-
principal investigator of a three-year "Building Digital Resources"
grant from the Institute of Museum and Library Services (IMLS) that
coordinated the gathering of historical materials from organizations
and institutions statewide. That project, based at University of
Tennessee-Knoxville, focused on Tennessee history. Staffs were placed
in each area of the state (west, middle and east) with a charge to
collect unique photos, documents and other materials from cultural
institutions. Selected items featured both major trends and figures in
Tennessee history (e.g. early settlement, Ida B. Wells), and significant
local items (e.g. Bradley County Courthouse). An important
component of the project was to relate the digitized objects to the
curriculum followed by schools in Tennessee. Walker Library was a
satellite digitization center for the middle Tennessee region (Conner,
Middleton, Carter, & Feltner-Reichert, 2009), with several librarians
and staff participating.
As the IMLS grant project ended, Walker Library began exploring
ways that it could build its own digital skills while continuing to assist
area archives, libraries, and museums to digitize unique collections.
We would quickly lose skills and momentum if we did not build on our
Volunteer Voices experience by securing the institutional support to
initiate our own projects. Noting that a key mission of the University
is to develop "mutually beneficial partnerships" (Middle Tennessee
State University, 2002) we examined appropriate models among more
experienced institutions.
In 2002, The University of Pittsburgh's Digital Research Library
began a two-year project to digitize a collection of 7,000 historic
Pittsburgh images in partnership with the Historical Society of
Western Pennsylvania, the Carnegie Museum of Art, and the
Collaborative Digitization Goes Local 437
University's Archives Service Center (Galloway, 2005). As evidence of
the continued success of this approach, they have added three more
partner institutions over the last five years and now have more than
14,000 images in the collection (University of Pittsburgh Digital
Research Library, 2010). The Digital Research Library has also
collaborated with University of Pittsburgh faculty to create such
projects as American Left Ephemera and Visuals for Foreign
Language Instruction (Galloway, 2009). This program offers home
pages for each of its 70 digital collections; each collection home page
offers search and browse options, and a link to the partner
institution's home page.
IUPUI Library's Program of Digital Scholarship has collaborated
with an impressive array of campus and local institutions. The
collection websites and promotional material give every indication
that the Program of Digital Scholarship treats these institutions as
equal partners: the "Partners" page provides background information
about each institution; logos of relevant partners appear on each
collection home page; and a link for contact information appears in
each record. Its list of partners continues to grow. Building on its
initial collaboration with Conner Prairie Museum that features textiles
from the museum's collection, the Library recently announced two
Conner Prairie collections based on 3-D imaging technology.
After touting the benefits of such collaboration (e.g., cost-
effectiveness), Bishoff (2004) offers further guidance for librarians
who want to forge lasting partnerships with other cultural heritage
institutions: involve all partners from the start; respect each
institution's mission and culture; and ensure that each partner
benefits from the end product.
Building on these models we outlined a vision for our own digital
project design and received the support of our Dean to purchase a
server, large flatbed scanner and a license to CONTENTdm software.
We then leveraged that support to secure a $5000 grant from the
University Special Projects Committee to fund graduate students for
our first project, MTSU Memory. More on that collection will follow.
438 Collaborative Digitization Goes Local
Institutional Context
Walker Library serves 25,000 students on a rapidly-growing campus.
Its funding and staff levels are significantly lower than most
comparable institutions within Tennessee and nationally -
approximately 60% of the average of our designated peer institutions
on a per-student basis in 2005-2006 (W. Black, unpublished data).
Digital work has been accomplished by reassigning small portions of
time for several librarians and one staff member, making extensive
use of graduate students and purchasing equipment and software on
an as-needed basis, usually at the end of the fiscal year when other
needs have been satisfied. Digital projects management has been
added to the load of our small staff. We have no programming support
for customizing CONTENTdm or working with open source programs.
This has hindered our ability to implement advanced features such as
embedded audio and video players.
Staffing
The situation for digital projects did improve recently when the
Library administration designated our working group as "the Digital
Initiatives Team," thus formalizing our role in the Library, and clearly
recognizing the roles of the team members: Ken Middleton,
Reference/Microtext Librarian (Team leader); Mayo Taylor,
Reference/Access Services Librarian; Fagdeba Bakoyema,
Reference/Web Librarian; Lynda Duke, Microtext manager; and Jo
Williams, Cataloging Librarian.
Because these team members can only devote limited, irregular
chunks of time to digital work and because they are drawn from
different departments of the library, communication is the key to
moving forward. We work hard to avoid problems by holding weekly
meetings that also include our student workers, maintaining a wiki for
the management of meeting minutes and other information, and by
sending monthly reports of major activities to the Library Dean and
Coordinators of both Library departments.
We are able to balance the restrictions on our professional staff
time by liberal use of graduate students from the Public History
Collaborative Digitization Goes Local 439
program on campus. They bring to the work an awareness of the
historical and cultural significance of the collections, a desire to gain
skills that are highly relevant to their discipline and a facility with the
technology involved. Because many of these students will enter
careers in history museums or archives they also make ideal liaisons
to local historical agencies. Our graduate students have been
supported primarily by small grants received in support of specific
collections, as discussed below. We have supplemented this labor
force with an undergraduate scholarship student from the Electronic
Media Department, at no cost to the Library, and by utilizing students
employed at the Microtext service desk for projects that can be done
when they are not busy with patrons.
Software /Hardware
Staffing issues also heavily influenced our decision to purchase the
CONTENTdm digital content management software. Although its
shortcomings will be duly noted, CONTENTdm offers institutions the
ability to quickly create digital collections with minimal systems
support. In addition at least six institutions in Tennessee are using the
system, creating opportunities for collaboration.
The Library has purchased a good mix of digitization equipment.
An Epson Expression 10000 flatbed scanner is large enough to handle
most photographs and documents that we have selected for scanning.
The Library recently purchased a Bookeye 3 planetary scanner with
OPUS software. Intended primarily for the Interlibrary Loan
department, it is available for digital projects outside standard
business hours for fragile books and large format materials. We also
have two portable scanners and a laptop computer to assist area
institutions in digitizing their materials.
Collections
Campus Partnership: MTSU Memory
In anticipation of MTSU's centennial year in 2011, MTSU
Memory (http://library.mtsu.edu/digitalprojects/mtsumemory.php)
440 Collaborative Digitization Goes Local
will include photographs, documents, books, and oral histories that
document our institution's growth from a small Normal School to a
doctoral university with Tennessee's largest undergraduate
enrollment. The Library's Special Collections hold many items of
interest, including the institutional copies of yearbooks, faculty
directories and the student newspaper. However, a larger repository of
essential items is at the Albert Gore Research Center, which serves as
a de facto campus archive. An early emphasis was to consult with the
Director of the Gore Center to establish a working relationship and
procedures for pulling and scanning materials.
Collection Focus
The digital projects team wanted to maximize usage of the
collection by focusing on topical areas: the Great Depression, World
War II and aftermath (e.g., GI Bill), the Vietnam War and student
activism, and African Americans and women on campus. We expected
the majority of items in the collection to be photographs, but wanted
to include other types of primary sources (artifacts, newspaper
articles, and oral histories) to present a multifaceted portrait of the
University's history. The selection process for this collection has in
fact been rather unusual because we initiated it without knowing what
we would find, adjusting our expectations as new materials have been
revealed.
Early Implementation
Walker Library launched its installation of CONTENTdm in late
summer of 2007. Installation of the dedicated server and
implementation of the software was handled by our Systems
Librarian. Initial configuration of the CONTENTdm software was
done by Ken Middleton when our first collection, MTSU Memory, was
established. The remainder of the Digital Initiatives team members
had been recruited and we had received guidance from our Dean on
the direction that he would like us to take. Everyone on the team had
some experience with digitization projects but none had worked with
CONTENTdm. In an ideal world we would have enjoyed a beta period
of testing the system and training the graduate students. In the real
world, the clock was already ticking on a grant project and the
Collaborative Digitization Goes Local 441
graduate students needed work to do. They worked on the project
more hours in most weeks than did their supervisors, leaving the
professional team members hard pressed to hone their own skills to
keep up with the graduate students. The students began selecting
materials and scanning materials before any of us were proficient in
using CONTENTdm and before we completed the configuration of the
MTSU Memory field structure. One example of the consequences: as
each record is created it is put in an approval queue to be reviewed
and approved by one of the librarians. If the librarian isn't able to keep
pace with the student, simple errors are repeated many times before
they are corrected, which exacts a toll in terms of time wasted on
corrections.
To slow down the rate at which items were loaded into
CONTENTdm we gave the students guidance on the types of items we
were looking for and sent them over to the Albert Gore Research
Center. They concentrated on selecting items, but without the benefit
of prior experience in completing metatadata they didn't know exactly
what information they needed to capture to assure complete
documentation of the items. The result was that many objects were
loaded with incomplete data and improper form. Predictably, we are
still returning to some of the earlier records to fill gaps and fix
inconsistencies.
By summer of 2009 we again posted a job notice with several
graduate programs on campus and again selected two Masters
candidates from the Public History program. This time we started
each of them with an introduction to the collections they would work
with and the metadata guidelines available on our wiki followed by
CONTENTDM tutorials and documentation. Finally, they began by
importing and creating metadata for objects selected by our previous
workers. Only after these initial projects were complete were the
students free to find new materials for the collections.
Expanding Opportunities
Our original vision for MTSU Memory was a collection composed
primarily of photographs and documents. However, we adjusted that
44 2 Collaborative Digitization Goes Local
concept when given the opportunity to include excerpts of audio and
video interviews and oral histories.
Audio materials: As our graduate students investigated
materials at the Gore Research Center they discovered an excellent set
of oral histories that had been conducted with alumni from past
decades. From that extensive collection, twenty-five were selected
because of their relationship to the lead themes of our collection, such
as World War II, Civil Rights, etc. Our initial plan was to digitize and
display the typed transcript of each interview and then offer selected
audio excerpts as a supplement. When a new Director joined the Gore
Center he decided that the transcripts needed additional work to be
ready for online publication. We adapted by finding a work around to
offer only the selected audio excerpts. It consisted of loading MP3 files
to our server, but not within CONTENTdm, importing a photo of the
subject as the digital object for the CONTENTdm record, and then
creating a metadata field to include links to the MP3 files. The work of
selecting, reviewing and excerpting the oral histories was done by one
of our graduate student workers as an internship for course credit. She
selected and downloaded the freeware program, Audacity, to manage
the sound files and participated in designing the customized process
for creating the records. We hope to include the entire transcript of
each record at some point. At present our CONTENTdm records direct
users to the items available at our partner across campus.
Video materials: Another treasure trove at the Gore Center was
a collection of interviews that were broadcast on the campus television
station in the 1980s and 1990s. Of particular interest was a series of
four lengthy interviews with Miss Mary Hall, an Education professor
important to MTSU and the related University demonstration school
from 1929 to 1963. We again turned to one of our graduate students
who had recently completed a course in documentary filmmaking. She
was able to use her personal computer (a MacBook) and software
(iMovie) to create a series of video excerpts that encapsulate
important phases of Miss Hall's life.
Book-length publications: As we researched our own campus
for the MTSU Memory project, we frequently consulted The First Fifty
Collaborative Digitization Goes Local 443
Years, a campus history written by alumnus Homer Pittard, that
covered the period 1911-1961. Because MTSU held the copyright for
the book we decided to digitize the entire volume. This initiated an
intensive learning experience about the pros and cons of compound
objects and ultimately the process of creating complex PDF files. The
process of scanning the pages and doing OCR through the Abbyy Fine
software was quite straightforward. A definite learning curve was
encountered in figuring out how to structure a compound object from
the hundreds of page images. However, once the book was loaded into
the MTSU Memory collection, we were dissatisfied with the results,
largely because queries could return dozens of individual pages
intermingled with other materials. In addition, when we figured out
the compound object expense in terms of licensed objects used, we
rethought our original approach. In brief, a 200 page book exacts a
cost of 201 objects utilized, 200 for the images and one for the
combined record. By comparison, a book imported as a PDF made
from the same original image files costs only one licensed object.
While this may seem an arcane concern, with a licensing level of only
10,000 items we could envision our capacity diminishing quickly if we
continued to do book-length items. So, The First Fifty Years was
reloaded as a PDF as was our second book, The Raider Forties, and a
thesis. Smaller multi-page items continue to be treated as compound
objects.
It is worth mentioning that the experience of the user is quite
different for audio, video and large PDF text files in our collections
when compared with objects loaded as single JPEG files, compound
objects or small PDFs. Whereas those files provide a display image
immediately upon opening the record, the other file formats all
depend upon software programs external to CONTENTdm (e.g. Adobe
Acrobat or Quicktime) to open the file. In the case of large media files
or PDFs the process can be relatively slow. The record display is also
marked by the rather unimaginative default message of "Access this
Item," which is not far removed from the archaic "click here" message
of early web pages. We hope to find a partial solution to this problem
through embedded media players.
444 Collaborative Digitization Goes Local
Customizing Our Flagship Coiiection in CONTENTdm
Although CONTENTdm is an excellent tool for creating digital
collections quickly, it does not provide a particularly user-friendly
public interface. All searches retrieve results from all collections,
which can be problematic when you have collections as varied as
university history and ecology. We have created a separate entry page
for each collection with a brief description of the collection, a search
box, a link to an advanced search, and browsing options.
The lack of default browse options for subjects, time periods and
formats has been particularly frustrating. CONTENTdm requires us to
create the browse lists manually by running "custom queries" for
every subject, for instance, that appears in a list. This process requires
a significant amount of time. We have recently discovered Elias Tzoc's
web-based tools for creating CONTENTdm browse lists (Tzoc, n.d.).
Creating the list of 200 subject headings in MTSU Memory would
have taken approximately 10 hours if we had used custom queries.
After learning a few fairly simple steps, we created the list in just a few
minutes using Tzoc's tool.
One extremely useful feature of a recent CONTENTdm upgrade is
that search results now include facets that help users narrow their
searches. For example, a simple search retrieves a list of records that
can be narrowed by time period, subject, and genre. However, making
changes to the default facets required too much time. Rather than
simply check the desired facets in an administrative module, one has
to find the relevant line of code in a PHP file, use correct PHP syntax,
and determine the correct code for each facet. Many libraries may lack
the technical staff to make such changes.
We are making efforts to attract users through additional access
methods. Our metadata librarian has set up a profile in OCLC's digital
collection gateway to streamline the creation of WorldCat records for
our digital items. We are working at the state level to use the
CONTENTdm Multi-site Server, which would create a central site for
searching across CONTENTdm collections in the state.
Collaborative Digitization Goes Local 445
Community Partnership
During 2009 the Digital Initiatives team received a $1000 grant
from the MTSU Public Service Committee to assist several community
associations with digital projects. Selected organizations were invited
to propose projects; the first application received was from the Arts
Center of Cannon County (ACCC). They had been given a collection of
nearly 1000 photographs that were taken between 1944 and 1977 by
William L. Clements, District Conservationist for the Soil
Conservation Service (SCS). He served Cannon County, a small
agricultural community just east of Murfreesboro and MTSU. The
photos documented the farming methods that were disseminated by
the SCS in that period. Significantly, each photo had descriptive
information typed onto the image back, which made the task of
describing each image quite easy.
The Clement Collection was much larger than our initial idea of
what would be handled for each project with our small grant, with
nearly 1000 photos available, but it was so compelling that we worked
to find a way to accept it. The answer came through a supplemental
$300 grant from our local Gannett newspaper and through use of a
student worker who needed to fulfill work hours for his scholarship.
The Cannon County, Tennessee Collection (http: //library.
mtsu.edu/digitalprojects/cannon.php) includes approximately 400
photographs selected from the total available.
We accepted the Clement photos because they beautifully
illustrate and describe the details of the County's predominant
occupation, farming, over a period of three decades. In addition, the
collection was an ideal test case for working with area institutions to
create digital collections:
• The Arts Center is an essential institution in a community where
farming is still economically and culturally significant.. Many of
the families pictured are still on their farms today.
• The Arts Center staff had already worked on a promotional plan
for the collection and have developed plans for creating an exhibit
based on the photos.
446 Collaborative Digitization Goes Local
• Additional rich contextual material is available. Three hours of
recent oral history interviews with the photographer will be
included along with an interpretive essay from Evan Hatch,
folklorist with the Arts Center.
• There were no copyright issues because the photos are the product
of a federal employee.
• The photos have been relatively easy to scan and process because
the photographs are all the same size. Moreover, detailed
metadata is already recorded on the back of each photograph.
The homogeneity of the collection also provided an opportunity to
develop streamlined processes, for example by loading nine images at
a time onto the large format scanner, then utilizing Photoshop tools to
deskew and adjust each image before saving it individually. We also
were able to make optimum use of the template option within the
CONTENTdm Project Client to automate the entry of identical
information into selected metadata fields as each object is imported.
In this manner large groups of records could be created in the Project
Client and then uploaded in a batch operation for approval. The
approval process itself was efficient because only the relatively few
unique fields in each record (e.g. title, LC subject heading) needed to
be carefully checked by the approver.
We have also used Elias Tzoc's web-based tool to create a tag
cloud based on subject headings in the collection. Tag clouds offer a
useful alternative to traditional subject heading lists because they
provide users with a quick sense of the major subjects in a collection.
The Digital Initiatives team has made every effort to involve the
Arts Center of Cannon County in developing the collection. Initial
meetings with Evan Hatch, the Center's folklorist, covered everything
from selection issues to scanning standards. The Center's logo appears
on the collection web site, and links to the Center's web site. Treating
the Center as an equal partner improved the quality of this initial
digital collection, and it has also increased the likelihood of future
partnerships. Early success of the project has encouraged discussions
of additional materials that may be forthcoming to document the
history and deep crafts tradition of Cannon County.
Collaborative Digitization Goes Local 447
New and Forthcoming Projects
Buchanan Collection - will document the history of a local
family that has included a Nobel Laureate MTSU alumnus, a
governor, and an early settler of Nashville. The collection is being
compiled in cooperation with family members.
Cedar Glades Studies Collection - a partnership with the
MTSU Center for Cedar Glade Studies will digitize research on the
rare cedar glade ecosystems of Middle Tennessee. Most notably the
1948 doctoral thesis of Dr. Elsie Quarterman was digitized with
support from a small grant passed through the Biology Department.
Her thesis was handled by two practicum students from the UT-
Knoxville School of Information Science.
Shades of Blue and Gray: Reflections of Life in Civil War
Tennessee - will include digital images of 250 artifacts from small
archives, libraries, and museums across the state. In partnership with
Celia Walker, director of Vanderbilt University's Peabody Library, we
will create a metadata template in CONTENTdm, enter metadata, and
customize the CONTENTdm database for items collected by other
entities.
Stones River Battlefield Historic Landscapes - is digitizing
deeds and other materials to document the largely forgotten African
American community of Cemetery, which developed after the Civil
War on land that would become the Stones River National Battlefield.
Created by MTSU history professor Rebecca Conard and her graduate
students, it is hosted by the Library and Digital Initiatives staff have
played an advisory role.
Lessons Learned
The Digital Initiatives team's collaborative approach has made it
possible to take on more projects. In turn, our list of lessons learned is
comparatively long and keeps growing as we face challenges specific to
new projects. However, as the list below illustrates, many of the
lessons apply either to every digital project.
448 Collaborative Digitization Goes Local
• Libraries that lack a digital projects department (and the
corresponding budget line) should consider applying for small
grants. In our case, the three grants have been enough to fund
students doing the scanning and metadata entry.
• Recruit student workers / interns with relevant skills, and offer
them a rewarding work experience. Recognizing that our public
history interns are indispensable to the success of our digital
projects, we have treated them as equal partners on the Digital
Initiatives team. We have encouraged them to attend our weekly
meetings, welcomed their suggestions for the improvement of
digital projects, and tailored their internships to their interests
and skills as much as possible.
• Be flexible. One of our best decisions has been to include
interviews in audio and video formats in MTSU Memory, in spite
of production delays. In addition, we adjusted our very modest
approach to working with area institutions when we saw the
potential of working with the Arts Center of Cannon County.
• An effective communication structure is essential, particularly for
new digitization programs. By having weekly meetings, we have
addressed metadata and technical problems before they become so
large that they are difficult to correct.
• Communication is important when working with collaborative
partners; tact, patience, and ability to explain both the process and
benefits of digitization can not only help put partners at ease, but
also pave the way for additional joint projects.
• Invest time in strategic planning. By focusing so much on the
details of specific projects, we have not devoted sufficient time to
plan for a robust approach for digital preservation.
Conclusion
Perhaps the most important lesson learned mirrors our experience
with the statewide Volunteer Voices project. Small libraries, archives
and museums have valuable collections, and often want assistance in
digitizing these collections. By sharing our successes and growth pains
Collaborative Digitization Goes Local 449
in working with an academic archive, a small arts center/museum,
and a growing list of other partners, we hope to encourage other
institutions to proactively seek out collaborative digitization
partnerships at the local level.
References
Bishoff. L. (2004). The collaboration imperative. Library Journal 129
(1), 34-35-
Conner, T.R., Middleton, K., Feltner-Reichert, M., and Carter, A.
(2009). Volunteer voices: Tennessee's collaborative digitization
program. Collaborative Librarianship, 1 (4), 122-132. Retrieved
from http://collaborativelibrarianship.org/index.php/jocl/
article/view/ 38/26
Galloway, E. (2005). Historic Pittsburgh image collections. D-Lib
Magazine, 11 (11). Retrieved from: http://www.dlib.org/dlib/
november05/nfeatured-collection.html
Galloway, E. (2009). Challenges of creating digital libraries:
Digitizing, organizing, storing, accessing content. 2009 CSE
Annual Meeting, May 5, 2009. Retrieved from:
http : / / www.councilscienceeditors. org/ events / annualmeeting09/ p
resentations/galloway.pdf
IUPUI University Library. (2009). Library grant makes more
Indiana history accessible to all. Retrieved from http://www-
lib.iupui.edu/ node/ 1306
Middle Tennessee State University. (2002). Academic master plan:
Middle Tennessee State University 2001-2012. Murfreesboro, TN:
Author.
Tzoc, E. (n.d.). Re-Indexing CONTENTdm metadata. Retrieved from
http://staff.lib.muohio.edu/~tzocea/files/CONTENTdm/re-
indexing/
University of Pittsburgh Digital Research Library. (2010). Historic
Pittsburgh image collections. Retrieved from:
http://images.library.pitt.edu/pghphotos
Picturing the Past and Planning for the
Future: Central Florida Memory
Lee Dotson, Selma Jaskowski
(University of Central Florida Libraries)
Abstract
Central Florida Memory is a collaborative digitization project to create
a collection of primary sources related to Central Florida's history that
provides a compelling look at Central Florida before the area became a
tourist destination. Through cycles ranging from limited resources to
ample grant funding, the collaboration has grown from a small pilot
project with three partners to over 80,000 images contributed by
seven partners. Organized for the user, the website has been enhanced
to create more user friendly methods of access and includes storylines,
lesson plans, and links to other sites of interest. This paper provides
lessons learned about how to incorporate a variety of materials from
diverse institutions into a single digital collection.
Keywords: Academic library, Bethune-Cookman University, Central
Florida Memory, Collaboration, CONTENTdm, Digitization,
Historical Society, IMLS, LSTA, Museum, Museum of Seminole
County History, Olin Library of Rollins College, Orange County
Library System, Orange County Regional History Center Public
library, Stetson University, University of Central Florida Libraries.
The Beginning - 2002
When the directors of the University of Central Florida Libraries and
the Orange County Library System came up with the idea for Central
450
Central Florida Memory 451
Florida Memory (CFM) over drinks in 2001, it seemed like a friendly
way for the two institutions to collaborate on a project that would
serve both academic and public library patrons and foster inter-
institutional understanding.
With the addition of the Orange County Regional History Center,
the three institutions had the necessary resources to begin building an
exciting digital collection. In 2002, the group began meeting to
evaluate the collections of the three partners and determine what
materials in each collection would provide the most compelling and
engaging look at Central Florida's history.
Coming from very different missions and working environments,
the partners recognized early on that this collaboration would take
some work. Staffing, funding, and the capacity for participation
needed to be discussed and an understanding of each institution's
identity and purpose gained. Through quarterly meetings where
brainstorming was center stage, the group identified the strengths
each partner could bring to the project. Committees were formed and
populated with members from each institution to foster
communication among participants as well as organize functional
responsibilities. While the full group discussed and agreed on overall
structure and support, the committees focused on operational tasks
such as selection, metadata and cataloging, storyline, marketing, and
evaluation and assessment. The scope of the project was defined,
collections examined and material chosen for inclusion in what was
hoped would become the definitive site on the history of Central
Florida.
Pilot Project - 2002
With the goal of applying for grant funding, the partners conducted a
pilot project to prove the viability of such a collection. It would also
serve to lay the foundation for scanning materials at a centralized
location and create a website dedicated to Central Florida's history.
Since their vision included building a digital collection with
contributions not just from the original three partners but from
libraries, historical societies, and museums throughout Central
45 2 Central Florida Memory
Florida, the pilot project would be the beginning of a regional effort to
expose hidden collections in small institutions. By digitizing materials
from organizations with wonderful collections but no digitization
resources, Central Florida Memory would be the vehicle to make these
rich resources available electronically.
To begin, materials created from 1880 to 1930 were selected from
the collections of all three institutions. The UCF Libraries offered their
digital imaging facilities and servers, Orange County Library System
lent the services of their graphic designer, and the Orange County
Regional History Center contributed an abundance of material to the
collection. The partners agreed to deliver materials for scanning to the
Digital Services unit at the University of Central Florida Libraries.
With one part-time scanning assistant, a flatbed scanner, and a digital
camera, work began on digitizing thirty-five items totaling
approximately 2,000 images from materials such as maps, letters, city
directories, monographs, and ephemera. Overall, Central Florida
Memory adhered to the recommendations set forth by the Publication
of Archival, Library, Museum Materials (PALMM) and Cornell
University. PALMM guidelines were routinely referred to for
copyright concerns and digitization standards (PALMM
Documentation, 2010). In addition, Cornell's Moving Theory into
Practice Digital Imaging Tutorial provided a strong foundation for
making decisions regarding image creation, file management, quality
control, and digital preservation (Moving Theory, 2010). Size,
condition, content, and intended end use of the item to be digitized
were several of the factors taken into consideration when determining
the most appropriate method and equipment.
Since the materials provided by the Orange County Regional
History Center and the Orange County Library System were often
fragile or rare materials, an environment of trust was essential to the
success of the project. This was especially important for the History
Center as it was not ordinarily their practice to allow materials to
leave their reading rooms much less travel across the city.
Representatives from the partner institutions were invited to spend a
day with the scanning assistant to alleviate any concerns about the
care and handling of materials as well as have a front row seat to the
Central Florida Memory 453
scanning process. With the small number of items involved in the
pilot, tracking materials was easily performed using spreadsheets and
email, and delivery and return of items was handled in person.
The next step was to choose a content management system for
storing, searching, and retrieving the digital images. CONTENTdm
was chosen because it could either be set up rather quickly and used
easily by all levels of staff as an out-of-the-box installation or
customized to create a unique look and feel for the collection (Lavoie,
2005). During the pilot, the installation was customized and objects
were loaded into three distinct institutional collections. When
contributing materials, partners were asked to fill out forms
describing their materials. This information was used to create the
metadata when the images were added to the collection. With a
CONTENTdm collection of close to 2,000 images and a website in
production, Central Florida Memory was born.
In order to present the collection in a way that was most
appealing to a wide audience, "stories" were included to provide
background for the images found there (Success story, 2009). The
storylines were divided into sections - Dreams and Schemes; Roads,
Rivers and Rails; and Critters, Crackers and Cottages. Each of these
sections was divided into subsections and included topics like tourism,
health, hotels, railroad, water, roads, nature, social life, people, and
place. In these subsections users would learn why people moved to
Central Florida, how they traveled, and what they experienced in this
new, harsh climate (Gronlund, 2005). Users could search and browse
the collection itself or get to images of interest through the stories.
The logo for the project and original website design were
contributed by the Orange County Library System. The pages existed
separately from CONTENTdm itself and supported the Florida
Stories, Teachers, Site Map, Contact Us, and Links parts of the site.
The link to Collection and any links in the other sections to images in
the collection took the user from the website to CONTENTdm.
454 Central Florida Memory
TRAL FlORID
Follow our path and listen to the stcriet of Ci(/<ji, T
Alligators, Space, and more. Travel to the
maps, steamship tickets, postcards,
diaries. Hear Helen Witney writing to her me
In England about ehs joys and trials of life in Writer Park, and
Narcoose-fij In 1385. Read about an "Alligator Hunter In the
muting" in 1909- "Gimme • gun quick I u«n ■ 'g*tor!' a «i
Buddy « he tumbled dawn the <gmp*nlcwti«iy." m
'foil
G«t tO know Central Florida today, txploro yesterday., ind
crwlr a historic tomorrow.
Figure CFM-1. Central Florida Memory Website, 2003 (Central Florida Memory,
2003)
On 21 May 2003, the Central Florida Memory website and
collection were unveiled at a reception at the Orlando Public Library.
After presentations describing the Central Florida Memory concept,
how it was created, and plans for its future, refreshments were served
and attendees were shown how to navigate the collection on multiple
workstations set up in the meeting area. Included in the audience
were representatives from area libraries and historical societies, as
well as members of local government. Response to the new digital
collection was very favorable. Contact information for potential future
contributions was offered by several attendees.
Upon completion of the pilot project, the group applied for a
Library Services and Technology Act (LSTA) grant but was
unsuccessful. A year later, the group applied for and received a
National Leadership Grant from the Institute for Museum and Library
Services (IMLS) to expand the collection. The National Leadership
Central Florida Memory 455
Grant was a perfect fit for Central Florida Memory given that the
grant's goals included supporting "projects that have the potential to
elevate museum and library practice" while advancing "the ability of
museums and libraries to preserve culture, heritage and knowledge
while enhancing learning" (IMLS National Leadership, 2010). As part
of that grant project, additional partners were invited to join. The Olin
Library at Rollins College in Winter Park, Florida and the Museum of
Seminole County History in Sanford, Florida were welcomed to
Central Florida Memory.
As part of the IMLS grant proposal, a formal mission statement
was developed to illustrate the partners' ongoing commitment to
building Central Florida Memory.
The mission of Central Florida Memory is to create, through collaboration among
museums, libraries, historical societies, and archives in Central Florida, a
comprehensive digital collection, serving the needs of scholars, students, teachers,
and residents. Here they will access textual and graphical images and objects that
document the region's historical and cultural heritage.
In order to fulfill its mission, CFM established the following goals:
Provide access to the rich and varied content of libraries, museums, historical
societies, and archives throughout Central Florida.
Create an extensive digital resource for scholars, students, teachers,
genealogists, Central Florida residents, visitors, and the general public,
promoting lifelong learning.
Assist museums, historical societies, and libraries in digitizing their collections
and become a forum for sharing expertise and technical knowledge.
Offer a digital archive to those museums, historical societies, and archives
unable to establish their own digital infrastructure.
Provide an innovative digital resource for K-12 teachers and parents to bring
alive the region's history into existing and new lesson plans.
Expand, enhance, and disseminate the collaborative structure model
developed during the CFM pilot phase.
Contribute to the larger body of knowledge of Florida history via access to the
CFM website through other statewide project sites.
From the narrative of the IMLS grant submitted 3/31/2004 (Central
Florida Memory Grant Writing Committee, 2004).
456 Central Florida Memory
IMLS Grant - 2004
The IMLS grant phase provided a tremendous boost for Central
Florida Memory. The funds received from IMLS made possible a
commitment to increase the collection by up to 50,000 images.
Additional activities included creating lesson plans to support K-12
education, expanding and enhancing the storylines and website,
marketing the collection through promotional materials and a video,
and presentations and poster sessions at conferences. At this juncture,
deciding how to organize the collection was critical in affecting how
the end user would search and retrieve items. Should we continue to
divide the materials based on the contributing institution, by some
other criteria, or not at all?
CONTENTdm allows creating multiple collections based on
locally-determined parameters, or the ability to have one large
collection and allow metadata to create subsets of materials for users
as they browsed or searched. Having created three distinct collections
as part of the pilot project, we knew that there could easily be
confusion in searching and difficulties in upgrading. In the pilot
model, each time we upgraded CONTENTdm, it had to be performed
3 times, once for each collection. With the addition of 2 new partners,
it would be 5 times. Given those issues and understanding that the
original intention of Central Florida Memory was to be a collection of
a variety of materials representing all institutions, we combined the
institution-specific collections into one single collection. This would
make searching across all formats and all collections simpler and
applying upgrades less complicated. However, we soon found out that
combining collections had its own challenges. Working with a wide
variety of materials in a single combined collection would impact
production processes and necessitate new approaches.
With these changes and an ambitious workload ahead, some
adjustments to the committee structure and work processes were
needed. A Steering Committee was formed of the directors of the
partner institutions for administrative purposes. An Operations
committee was composed of institutional representatives who then
served on one or more of the seven working groups.
Central Florida Memory 457
The Curators Working Group was responsible for selecting
materials for digitization. Taking our initial cues from the pilot
project, materials were tracked on a shared spreadsheet and itemized
lists accompanied delivered materials. It was soon discovered that one
of the differences between scanning 2,000 images and 50,000 images
was how quickly Digital Services became inundated with materials
lacking descriptions or a process for tracking their progress (Dotson,
2005). Boxes of loose photographs seemed to appear from out of
nowhere and books with similar titles were easily confused. A
Digitization Request Form was created and partners were responsible
for submitting a form for each item and arranging for delivery of the
materials to the University of Central Florida Libraries Digital
Services unit (Digitization spec kit, 2005). The form helped to
consistently track materials through the use of uniform bibliographic
information and a computer generated unique identification number.
It allowed for entry of information pertaining to the lending
institution's contact information; physical item information such as
title, author, subject, description/notes, date original, publisher,
number of pages, format, call number/identification number, and
storyline; and digitization information including number of images to
digitize and notes regarding special handling or other circumstances.
The lending institution received a confirmation page upon submission
of the request form to be printed and delivered along with the
appropriate item. The confirmation page and unique identifier
accompanied the item throughout the process from material inventory
check in to scanning to metadata. In addition to providing a well
organized tracking system, these measures resulted in a meaningful
transition from a physical item to a digital image.
It was soon discovered that the scanning, metadata, and upload
processes for some of the materials varied from the normal
production processes. As general practice, when materials were
received they were reviewed to determine the appropriate equipment
to be used for digitization. The item was scanned, metadata created,
and the images and accompanying metadata uploaded to
CONTENTdm. However, special requirements for some materials
necessitated a "projects within a project" approach to be developed. In
458 Central Florida Memory
all, five different "projects" (as discussed below) with unique
characteristics were completed and added to the collection.
D Luteal 5*rtK**DjMbM* [\»t\f1i*i«tc*t}
J» w 9" ft—* w e—fc i«* -a**- a* ******
:'*y.r«j.^
T„(,l Hou'5
Apr 2DDt Apr
US
1 il-.J- .i^js r>
Sun | Mm| In | VM| Ttm
rn
«
I
H
J.
S
H
Figure CFM-2. Screenshot of the Digital Services Tracking Database (Digital
Services, 2005)
Newspapers
The challenges presented by including the newspapers in a
collection of mixed materials included inconsistency in the physical
size, large format scanning, and optical character recognition for full
text search capabilities. While several newspapers were digitized
during this time, the most extensive experience was with the
Sandspur, the student newspaper of Rollins College. The Sandspur
began as a literary journal in 1894. Digitization of the first seventeen
volumes proceeded normally as each issue ran 20 to 40 pages and
measured in size from 27x20 cm to 23x15 cm. With volume 18, the
format changed to that of a more typical newspaper length and size.
These fragile bound newspapers measured anywhere from 38x27 cm
to 41x28 cm and required scanning on a Kaiser rePro graphic stand
Central Florida Memory 459
with Karden-Linhof camera. Additional time and attention to detail
were required to complete the digitization of the newspapers. Once
scanned, the images were loaded into CONTENTdm as jpegs. Lacking
OCR capabilities at the time, the metadata cataloger painstakingly
created page level metadata to describe the contents for each image
(Dunlop, 2005).
Funeral home records
The Undertaker's Memoranda from the Carey Hand Funeral
Home records was an important collection used extensively for
genealogical and historical research on Central Florida's people and
places. The collection contained records and ledgers on services
provided at the Carey Hand Funeral Home from 1891-1955. Forty-four
books of the Undertaker's Memoranda covering the time period from
June 15, 1891 through November 10, 1927 were made available on
Central Florida Memory. In order to make the records searchable
online, the handwritten fields had to be transcribed and linked with
the appropriate image. A set was created for page level, field-by-field
metadata, to represent information about the person, death, family,
and burial. Efforts were coordinated between Digital Services and
Special Collections so that as the books were being scanned,
information from the handwritten fields was entered into a
spreadsheet. Prior to uploading the materials, the appropriate rows on
the spreadsheet were turned into text files that contained the
associated metadata for each page. The end result was a collection of
records searchable by name, date of death, cause of death, age when
deceased, place of residence, gender, race, parents' names, and burial
place.
Voter's registration records
The Orange County voter's registration records collection was
also of great interest to genealogists. It contained the voting records of
registered voters in Orange County cities from 1912-1934. All ledgers
were numbered as volumes. Each volume contained the voter's name,
age, race, occupation, local residence, party affiliation, whether a poll
tax had been paid, and nativity or place of birth. Other information
may have included the voter's declaration of naturalization, date of
460 Central Florida Memory
voter's registration with a political party, and remarks concerning
disabled veteran voters, registration changes and deceased voters.
Figure CFM-3. Carey Hand Funeral Home Undertaker's Memorandum ("Book 24:
Carey Hand," 2005).
The voter's registration records were slated to be digitized from
microfilm to reduce the scanning workload. Unfortunately, the only
microfilm copy available had received a great deal of use in a public
library resulting in unreadable poor quality images. The original
records had been in storage for many years and they had been
randomly stored in boxes. It took several days to sort and organize the
books by voting precinct before scanning or metadata creation could
Central Florida Memory 461
begin. A graduate student was hired to assist with metadata for the
257 books that contained over 8,000 records. Since the records were
in an oversized ledger book format that read left to right across both
pages, it was quickly realized that scanning procedures would need to
be adjusted. Using the overhead scanner proved to be too time
consuming for the 8,000 plus images.
rf<M,FLOfiiw Mi .mops
Figure CFM-4: Magnified image of Orange County Voter's Registration Record
("Orange County Voter Registration," 2005).
The most efficient method of scanning these images was
determined to be scanning the left and right pages individually and
manually merging the two into one cohesive image. For each of the
voter's registration images available, 3 images were created. The end
result of zoomable jpeg2000s allowed the user to see an overall view
of the page and zoom in to read the information line by line.
462 Central Florida Memory
Handwritten letters
Two sets of handwritten letters, providing great insight into the
lives of settlers in early Central Florida, became the fourth "project."
The first set was penned by Arthur William Catesby Smyth. Smyth
wrote letters to his fiance and first cousin, Beatrice Alice Smyth, while
she was living in England and he was living in Orlando. Smyth was a
partner in Smyth and Co. Orange Packers and Shippers. The letters
focus on the developing relationship of Arthur and Beatrice, but also
include some general information on the citrus industry, property
prices and social events. The other set of letters was written by a
newlywed transplant to Florida, Helen Warner, who regularly wrote
home to her mother, Mrs. Haig, who lived in Buckinghamshire,
England. The letters describe both personal matters, and those of
larger interest, including the establishment of an orange grove and the
development of the city of Narcoossee. Helen and Bill Warner moved
to Florida around 1885 and stayed in Winter Park while waiting for
their house to be built in Narcoossee. Once settled, they planted an
orange grove. The letters trace their life through the birth of their
daughter and Bill's job as a steamer captain. Employment troubles
dogged the family and he attempted to sell the orange grove in June
1887. When the letters come to an end, the family had not yet made
any plans on what to do next.
Many of the letters suffered from poor paper quality and bleed
through from both sides of the pages causing the scanned cursive text
to be hard to read. Time consuming and painstaking detailed image
editing was conducted to enhance the natural lines of the text while
omitting letters fading through from the opposite side. Once
acceptable image quality had been achieved, the next step was to make
the content of the letters accessible by a full text search. Since the
letters were handwritten, no OCR software was available and limited
full text options to uploading text transcripts to a large free text
searchable field. Luckily, CONTENTdm provided such a field and the
History Center had hard copy typed transcripts available with the
letters. Lacking an electronic version of the transcripts, the transcripts
Central Florida Memory 463
were rekeyed and saved as individual text files appropriate for
uploading with the images.
Outsourcing
The digitization of certain newspapers and college catalogs
necessitated the outsourcing of certain functions such as microfilm
scanning and optical character recognition. Lacking the original hard
copies normally preferred for scanning, the microfilm versions of the
Rollins College catalogs, St Cloud Tribune, and Lochmede newspaper
were deemed to be of good enough quality for digitization. In addition
to microfilm scanning, optical word recognition for some previously
scanned newspapers and monographs, such as city and telephone
directories, was outsourced as well. Several vendors were contacted
and iArchives was selected to perform the services. With a proven
history of creating high quality images and "highly accurate and
patented OCR software which consistently produces higher accuracy
results when compared to the 'off the shelf OCR engines," iArchives
stood out from the rest (iArchives, 2010). During early discussions
about metadata and file naming conventions, it was quickly realized
that normal outputs differed slightly from the outsourced deliverables.
Several test batches of materials were processed to give a better
understanding of what could be expected. While some
accommodations could be made, other aspects were fixed components
of the iArchives system. Additional processing was necessary to bridge
the gap between what the vendor could provide and what the
expectations were for adding material to Central Florida Memory.
However, one of the most basic differences was one that was not
anticipated. The vendor's final images were delivered in a compressed
format. While it did not affect the quality of the images, software had
to be acquired and extra time was required to extract the individual
files.
During the IMLS grant phase, over 50,000 images were added to
the collection and the website was redesigned. Focusing on creating a
single collection rather than multiple collections defined by
institution, the site itself began to rely more heavily on the contents of
the collection. The first web design for CFM featured pages that were
464 Central Florida Memory
essentially a series of linked pictures that gave access to CONTENTdm
and could not be easily updated. That design was improved upon in
2004 when the pages were integrated with CONTENTdm. This gave
the website a more cohesive look and made changes to its content
easier. Unfortunately, it was found that upgrading CONTENTdm
usually "broke" the design and adjustments and testing needed to be
done each time.
To cap off the grant year, a workshop was held for partners,
potential partners, and others working in the digital collection field.
Presentations focused on what had been and could be accomplished in
building digital collections. The mission and goals of CFM were
outlined and attention paid to how the project activities were
organized and carried out. By providing examples of materials added,
equipment and software acquired, and lesson plans created,
institutions were shown how they could start their own projects.
Work continued on the project after grant funding ended. Orange
County Library System designed and implemented a CFM presence in
Second Life while the UCF Libraries maintained and upgraded
CONTENTdm and continued scanning activities.
LSTA Grant - 2008
In 2008, we applied for, and received, an LSTA grant to expand the
collection by 30,000 images and our geographical coverage to include
Volusia County. Similar to the previously funded IMLS grant, the
LSTA grant was a good match for CFM as it was "a state-based
program with a broad mandate to use technology to bring information
to people in innovative and effective ways" (Library services, 2008).
Bethune-Cookman University in Daytona Beach, Florida and Stetson
University in DeLand, Florida were invited to become partners and to
contribute material. Bethune-Cookman's contribution of historically
significant images to CFM would increase the representation of early
Central Florida's African-American community. Stetson's contribution
of images and materials expanded CFM's geographic coverage beyond
Orange and Seminole counties. To manage new contributions and an
increased variety of materials, the Digital Projects Management
Central Florida Memory 465
System was updated. As before, the new system tracked materials
through the use of uniform bibliographic information and a computer-
generated unique identification number. Once an item was received,
the new Digital Projects Management System then provided the
means to monitor the item's progress by tracking staff input in the
form of checklists and text entries at various stages of the production
process. From monitoring data and statistics to exporting metadata,
the system enabled skillful handling of a large number of unique
requests.
During the LSTA grant, the UCF Libraries' Digital Services unit
was responsible for scanning and processing over 28,000 images.
Previous experiences taught flexibility regarding production
procedures and to prepare for the unexpected. Images were routinely
reviewed for quality control and adjustments made to equipment and
lighting. Due to the complex nature of scanning photographic and
visual materials, it was decided that multiple equipment options
would be employed to guarantee the highest quality images while
protecting the original materials. By the end of the grant, four
different types of scanning equipment had been necessary to carefully
and accurately capture each image. Kodak i200 flatbed scanners were
used to scan photographs in good condition and loosely bound
materials. A newly purchased CopiBook overhead scanner proved
effective for dealing with large photographs, yearbooks, catalogs, and
unbound newsprint. Oversized and highly reflective materials were
scanned on the Kaiser rePro graphic stand with Karden-Linhof
camera. Tightly bound and moderately reflective items, such as bound
glossy newspapers, were scanned on the Avision FB6080E A3
Bookedge flatbed scanner.
Once materials were scanned and images corrected, metadata was
created by reviewing both the physical item and the scanned images.
Since efficiencies in uploading small compound objects (one or two
images) were achieved early on by using tab-delimited text files to
upload preliminary metadata, this method was applied to larger
compound objects as well. Controlled vocabulary fields were applied
to 18 fields including author, photographer, subject, subject-personal
names, publisher, studio, format, format-medium, subject-topic,
466 Central Florida Memory
repository, repository collection, type, language, coverage-spatial,
coverage-temporal, funding source, date contributed, and CFM
storyline. An important benefit of using controlled vocabulary fields is
that they allowed the end user to perform a new search by linking
metadata as phrases rather than individual words. In addition to
making the images available to the public, master images were
submitted to the Florida Digital Archive, "a cost-effective, long-term
preservation repository for digital materials in support of teaching and
learning, scholarship, and research in the state of Florida" (Florida
Digital Archive, 2010). Once images were uploaded to Central Florida
Memory, metadata records were exported and coupled with the
respective images to create packages to be sent to the Florida Digital
Archive for long term digital preservation.
■ umt ten 10 m ■
Figure CFM-5: Central Florida Memory Website, 2009 (Central Florida Memory,
2009)
The website was completely redesigned using a different color
palette and employing a cleaner look. CONTENTdm displays were
pulled into the web pages but kept separate from them. The site could
Central Florida Memory 467
be updated and content changed easily and not be affected by any
upgrades to the CONTENTdm software.
The Future
If the partners of Central Florida Memory have learned anything over
the last seven years it is that whether or not funding is awarded, the
project goes on. In funding years an extraordinary amount of work has
been accomplished, and equipment and resources acquired. During
the lean years when there is no funding, the work continues on a much
smaller scale, but it goes on. The partners of CFM continue to meet
quarterly, formulate new ideas on how to build the collection, improve
access to the collection, and spread the message that this resource
exists. New funding sources are being identified, new grants will be
written, and new ideas will be implemented. The partners of CFM
continue to believe in their commitment to the project and to each
other.
References
Book 24: Carey Hand Funeral Home records, April 1, 1924 - August
21, 1924. (n.d.). Central Florida Memory. Retrieved August 14,
2005, from http://www.cfmemory.org
Central Florida Memory grant writing committee. (2004). Central
Florida Memory. [Institute of Museum and Library Services grant
proposal]. Orlando, FL. Retrieved December 1, 2009, from
http://www.cfmemory.org/Share/About/Narrative.pdf
Central Florida Memory website. (2003). Retrieved May 11, 2003,
from http://centralfloridamemory.lib.ucf.edu.
Central Florida Memory website. (2005). Retrieved September 20,
2005, from http://www.cfmemory.org
Central Florida Memory website. (2009). Retrieved March 13, 2010,
from http://www.cfmemory.org
Digitization Spec Kit. (2005). Retrieved March 20, 2010, from http:
//www.cfmemory.org/Share/Workshop/DigitizationSpecKit.pdf
468 Central Florida Memory
Digital Services Tracking Database. (2005). Internal tracking
database used at University of Central Florida Libraries. Orlando,
Florida.
Dotson, L. (2005). See how it's done. Central Florida Workshop for
Museum & Library Personnel. Retrieved March 5, 2010, from
http : / / www. cfmemory. org/ Share /Workshop/Behind%20the%2oS
cenes%20l.ppt
Dunlop, D. (2005). How it all connects II. Central Florida Workshop
for Museum & Library Personnel. Retrieved March 19, 2010, from
http : / / www. cfmemory. org/Shar e /Workshop /
Metadata%2oPart%202.ppt
Florida digital archive. (2010). Retrieved March 8, 2010, from
http:/ /www.fcla.edu/ digitalAr chive/.
Gronlund, G. (2005). Weaving the story. Central Florida Workshop
for Museum & Library Personnel. Retrieved March 27, 2010, from
http : / / www. cfmemory. org/Shar e /Workshop / CENTRAL96 20
FL0RIDA%20MEM0RY%20WEAVING%20THE%20ST0RY%209
%2030%202005%202.ppt
iArchives process. (2010). Retrieved February 20, 2010, from
http : / / www.iarchives. com/process, shtml
IMLS National Leadership grants. (2010). Retrieved February 17,
2010, from http://www.imls.gov/applicants/grants/
NationalLeadership.shtm.
Lavoie, J. (2005). Showing it to the world. Central Florida workshop
for museum & library personnel. Retrieved February 23, 2010,
from http://www.cfmemory.org/Share/Workshop/
Showing%2oit%20to%20the%2oWorld.ppt
Library services and technology act grants. (2008). Retrieved
January 14, 2010, from
http://dlis.dos.state.fl.us/bld/grants/LSTA/ LSTAinfo.pdf
Moving theory into practice digital imaging tutorial. (2010).
Retrieved March 15, 2010, from
http : / / www.library.cornell.edu / preservation/tutorial
Central Florida Memory 469
Orange County voter registration, (n.d.). Central Florida Memory.
Retrieved August 15, 2005, from http://www.cfmemory.org
PALMM documentation. (2010). Retrieved March 11, 2010, from
http : / / fclaweb.fcla.edu /node/572/
Success story: Central Florida Memory. (2009). In A new story
based approach to digital collection [OCLC CONTENTdm
brochure]. Retrieved March 5, 2010, from
http : / / www. oclc.org/ us/en/ services /brochures /
211595usc_F_CentralFloridaMemory.pdf
Apollo 13.0: Digitizing Astronaut Jack
Swigert's Apollo Documents
Andrew Weiss (Fort Hays State University, Forsyth Library)
Abstract:
The Kansas Cosmosphere and Space Center and Fort Hays State
University Forsyth Library joined together in 2008 to digitize the
personal archive of Apollo 13 astronaut John L. (Jack) Swigert. The
documents (1966-1970) include blueprints and proofs of the Apollo
Operations handbook Malfunction Procedures, which Swigert was
called upon to revise in the aftermath of the Apollo 1 disaster in early
1967. The partnership between the KCSC and Forsyth Library has
been fruitful in the area of public relations and dissemination of
digital collections online. Some of the obstacles related to the
partnership have yielded positive results. As a result of the
partnership we created a standard for uniform file names for Forsyth's
Digital Collections, in adopting and customizing Goddard Core, a
variation of Dublin Core, and in developing an External Partnership
Protocol to improve future collaborative projects.
Keywords: Apollo Operations, Best practices, Collaboration,
CONTENTdm, Digitization, Dublin Core, Goddard Core, Handbook,
Malfunction procedures, Museum-library partnership, Space
exploration.
470
Apollo 13.0 471
Introduction
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
--T.S. Eliot, Four Quartets
While orbiting the moon in 1968, Apollo 8 astronaut Bill Anders
photographed Earthrise, one of the iconic images of the Apollo
missions. Fragile and tiny, the earth appears "almost as a disk" in a
dark void and concretely shows that humans are more miniscule than
imagined (Brooks, Grimwood & Swenson, 1979, p. 277). Although
later missions would overshadow Apollo 8, including the successful
moon-landing of Apollo 11 and the successful failure of Apollo 13, the
lasting legacy of this mission was a sudden appreciation for the Earth
itself (Brooks et al., 1979, p. 366). The irony that a photograph of the
Earth would become one of the most lasting images of the Apollo
Program was not lost on the astronauts. Anders later remarked, "We
came all this way to explore the moon, and the most important thing
is that we discovered the earth" (Dordain, 2009, para. 6).
This image, later adopted by environmentalists, eventually
contributed to the general Zeitgeist that the mundane should take
precedence over impractical dreams. Indeed, once an American had
walked on the moon— as much a display of American real-politik
might as a display of American ingenuity— public and political interest
in the Apollo Program waned. By 1973 its funding was on its last legs
and never again would the Apollo Program send an astronaut to the
moon (McKie, 2008) (Brooks et al., 1979, p. 366). Yet if media
coverage is any indication of an issue's timeliness, the 40th
anniversary of the successful moon landing in 2009 proves that the
spirit of the time remains vital. General interest in space exploration
remains high among specialists and the general public alike.
Furthermore, the improvement of digital technology is allowing us
472 Apollo 13.0
unprecedented access to the primary materials of important historical
events, making the discoveries of the time even more immediate as
they appear online. In essence, this is an exciting time to revisit the
solid-state era of the Space Race of the 1950s, '60s and '70s through
the power of digital technology.
In its partnership with the Kansas Cosmosphere and Space
Center (KCSC) to digitize the papers of John L. (Jack) Swigert, Fort
Hays State University's Forsyth Library has had the privilege to work
with documents of great importance to American and world history.
At the same time we find ourselves looking inward and realize that an
emphasis on the "mundane" still proves to be important. As a result of
our work, we realize the need for a strong foundation in the
development of sound policies and best practices that can work in
reality. As the Apollo astronauts discovered, the point of a journey is
not always to arrive; it's to find out where you've been.
Background
The Kansas Cosmosphere & Space Center (KCSC):
Second only to the Smithsonian's Air and Space Museum in terms
of collection size, the Kansas Cosmosphere and Space Center (KCSC),
located in Hutchinson, Kansas, is one of the leading collectors of space
artifacts in the United States; the items collected, many of which have
flown in space, provide a complete overview of rocketry and space
history from its origins in Nazi-era Germany, to the Space Race of the
1950s and 1960s, and to the era of detente between Russia and
America in the 1970s. Items housed include complete German V-i and
V-2 rockets, a Redstone Nuclear Warhead, a Titan rocket, a Russian
Vostok Spacecraft, The Mercury 7 Liberty Bell spacecraft, and the
Apollo 13 Command Module. Their holdings also include the largest
collection of Russian Space Program artifacts outside of the former
Soviet Union (Kansas Cosmosphere, 2009).
FHSU, Forsyth Library & FLDCI
Forsyth Library is a small academic library serving the needs of
Fort Hays State University's 11,000 full- and part-time students and
Apollo 13.0 473
275 faculty & staff members. Roughly 4,000 of those students are
traditional on-campus undergraduate and graduate students. The
other 7,000 students study through FHSU's Virtual College, an online
degree program. The library is central to the Virtual College and
provides assistance to those students in the form of a dynamic web
presence that facilitates off-campus access to our OPAC, databases,
and digital collections. This digital content includes the digitized
Swigert documents, our Master's Thesis Collection, letters from
Benjamin Franklin and historic glass-plate negatives collections. Our
first collection of materials was digitized in 2004. The Forsyth Library
Digital Collections Initiative (FLDCI) was established in 2008 in order
to facilitate the development and preservation of digital content.
Under the development of this framework of guidelines, we have been
able to create full-fledged digitization partnerships both within our
university, including the Sternberg Museum, and Graduate School,
and externally with institutions in Kansas. Our modest digital
collections are positioned to grow exponentially in the next few years,
and the establishment of the FLDCI promises to simplify the growth
process.
Partnership Details
The partnership between the Kansas Cosmosphere and Space
Center and Forsyth Library began tentatively in 2003 with the
proposal made by library director John Ross to digitize materials held
by the KCSC. The primary goal for this first joint partnership was to
scan, catalog and upload images from the Mercury Project Missions
into an online content management system. The digitization team
included the KCSC archivist, Kiersten Latham, Forsyth Library
cataloger, Jerry Wilson, and Forsyth Library Archivist, Patty Nicholas.
By the end of November 2004 the majority of images, which included
photographs of the Mercury spacecraft, astronauts, and engineers,
had been scanned.
Early enhancements to the project began with the use of Dublin
Core, which would later influence the presently used scheme. The
second development was the decision to purchase CONTENTdm as
the vehicle for the online dissemination of digital content.
474 Apollo 13.0
CONTENTdm was eventually purchased and installed for use in May
2005. However, personnel changes at the KCSC and time limitations
for Forsyth Library staff impacted both the development of image
metadata and digitization. Consequently, no digital images from the
Mercury Project were placed online until the project was revisited in
March, 2010.
In early 2008, in consultation with the new president and CEO of
the KCSC, Chris Orwoll, and the new Collections Manager, Meredith
Miller, another partnership was proposed by Forsyth Library director,
John Ross, with the aim to digitize the archive of Apollo 13 astronaut
John L. (Jack) Swigert. It was Swigert who had helped to revise the
Apollo Operations Handbook Malfunction Procedures after the Apollo
1 disaster of January 27, 1967. This collection of 10 archival boxes,
each dedicated to a single spacecraft system, included unique
unpublished materials, and represented an opportunity to refuel our
partnership. As soon as copyright clearance was secured, we were
given the green light to proceed.
During the three years that elapsed between the two projects,
CONTENTdm had been successfully implemented by Forsyth Library
Systems Technician, Heath Bogart, and was being used for Forsyth
Library Archives collections. Two digitization team members had been
added to improve the organization and workflows of the suddenly
burgeoning digitization projects, including new Catalog er and Digital
Content Specialist, Sherry Severson, and new Digital Collections
Librarian, Andrew Weiss. From August 2008 through May 2010, the
entire collection of materials in Swigert's archive was digitized, and
95% of these items were fully cataloged using Goddard Core, an
elaboration of the Dublin Core metadata scheme developed by NASA's
Goddard Museum.
Digitization Goals
This collection will help researchers and lay-people alike to see
the program as a work-in-progress, something which can get lost as
the Apollo mission becomes cemented in history. Our wider goal is to
develop a collaboration model that will establish a set of procedures to
use between a small academic library and a major museum. We
Apollo 13.0 475
believe this will eventually contribute to the foundation of a larger
digitization project involving all phases of national and international
space exploration extending from World War II to the mid-1970s.
Apollo Mission Background:
If we die, we want people to accept it. We're in a risky business,
and we hope that if anything happens to us it will not delay the
program. The conquest of space is worth the risk of life.
- Gus Grissom (Barbour, 1969, p. 125)
Along with John F. Kennedy's May 25, 1961 speech committing
the United States to the "Space Race," arguably the most important
date in the Apollo Program occurred on January 27th, 1967. On that
tragic day, Virgil "Gus" Grissom, Ed White, and Roger Chaffee,
became the first astronauts to die in the American Space Program.
Their Command Module suddenly caught fire while the three were
inside conducting a "Plugs Out" launch pad test one month before
their scheduled mission (Bilstein, 1996, p. 340). Because of design
flaws, the hatch could not be opened by the astronauts or blown open
externally. The shocking part of the accident was that it occurred
where none was expected: on Earth. Though a terrible disaster, the
Apollo 1 accident fundamentally changed the Apollo Program and, in
essence, prevented further accidents from happening while in space
(Brooks et al., 1979, p. 214-225).
After the accident of January 27th, a full investigation was
conducted in order to arrive at the cause of the accident. During the
investigation each piece of the Command Module was removed, using
an identical copy of the Command Service Module (#014) as a guide,
the engineers at NASA and contractor North American Aviation, Inc.
(NAA) narrowed the cause down to two main factors. The first factor:
the tests were being done in excessively high oxygen pressure,
increasing the potential for fire. The second factor: too much Velcro
was being used to tie wiring together. The combination of these two
factors along with the spark from a faulty wire and the flaw in the
hatch design, ironically implemented as a safety precaution, was
ascertained as the cause of the accident (Orloff, 2004, para. 61)
(Brooks et al., 1979, p. 214-225).
476 Apollo 13.0
The investigation marked a turning point in the development of
the Apollo Program. In the early 1960s, when the mission to the moon
became a national priority, NAA and NASA had agreed to a two-tiered
project. Tier one would develop a Block I spacecraft (1962—1968),
which was to be used for unmanned boilerplate missions and sub-
orbital rocket testing, and tier 2 would develop a Block II spacecraft
(1964—1973), which was to be used for manned earth- and lunar-
orbital flights and the moon-landing missions (Brooks et al., 1979, p.
229). The two phases overlapped between the years 1964 through
1968, but once the Apollo 1 accident occurred, Block I was phased out,
being used only for unmanned Apollo 4, 5 and 6 missions until early
1968 (Brooks et al., 1979, p. 232). In order to redesign Block II, both
engineers and astronauts were called upon to completely review the
blueprints, diagrams and procedures for every aspect of the Apollo
spacecraft. Included among these specialists was Astronaut John L.
(Jack) Swigert, who reviewed the Apollo Spacecraft Malfunction
Proc